Categories: Sanger Science16 August 202216.1 min read

From genomes to vaccines: lessons from the pneumococcus

For more than 70 years, scientists have been engaged in a real-world version of ‘Whac-a-Mole’, although unlike the once popular arcade game, the stakes could not be more serious.

The ‘mole’ in this case is the bacterial pathogen Streptococcus pneumoniae, which causes diseases ranging from ear infections to pneumonia, septicaemia and meningitis. Also known as the pneumococcus, this bacterium infects around 9 million people globally each year, with elderly adults and children particularly susceptible. Pneumococcal infection kills more than 300,000 children each year, mainly in low- and middle-income countries.

A number of factors make the pneumococcus a distinctly tenacious foe. It is commonly carried in the nose and throat without any ill effects, meaning that it can lurk quietly in a human population before causing disease when conditions are right. The pneumococcus also has the ability to scavenge segments of DNA from nearby bacteria and integrate them into its genome. This is particularly important since it enables the pneumococcus to shuffle or completely replace the genes responsible for producing its capsule, a layer of carbohydrate that surrounds the cell and protects it from physical stress and attacks by the host immune system. The ability to update or entirely change its capsule from time to time enables the pneumococcus to evade immune responses that may otherwise have killed it. Pneumococcal bacteria with distinct capsules are known as serotypes, and the process of changing from one to another is termed serotype switching. The thickness and composition of the capsule also helps to determine whether a particular pneumococcal serotype will cause invasive disease.

Fortunately, not everything is tilted in the pathogen’s favour and, just like players of the arcade game, we also have mallets (in the form of antibiotics and vaccines) with which to try and whack disease-causing pneumococcal strains. Since 2000 a series of pneumococcal conjugate vaccines (PCVs) have been deployed that include capsular polysaccharides from multiple pneumococcal serotypes. The number of different serotypes covered by PCVs has expanded over the last twenty years, with a vaccine targeting 13 serotypes now available and one targeting 25 serotypes is in development. However, there are more than 100 distinct pneumococcal serotypes, and they can affect children and adults in different ways. Knowing which serotypes to target with the PCVs, and what the likely impact will be on colonisation in healthy people, invasive disease, serotype switching and antibiotic resistance in the wider pneumococcal population, is vitally important when designing effective global vaccination strategies.

“After vaccination the population refills the empty niche and goes back to the same level of colonisation, but any differences in terms of disease are down to how invasive those remaining serotypes are.”

Dr Nick Croucher,
group leader at Imperial College London and formerly a PhD student at Sanger

Assessing vaccine impact

When the first PCV (PCV7) was licensed for use, it was designed to target the seven serotypes most frequently found to cause disease in the US. Despite PCV7 having little impact on the presence of  pneumococcus overall, it reduced the incidence of pneumococcal disease in all age groups over seven years old by 45 per cent. PCV7 was subsequently rolled out to millions of vulnerable children around the world through Gavi, the Vaccine Alliance. Based on serotype surveys it was known that the prevalence of different serotypes varied over time and in different countries, and it was anticipated that PCV7 would be less successful in places with a high disease-burden. However a detailed picture of the geographical spread of the various pneumococcal serotypes and how they might respond to PCV7 and subsequent vaccine formulations were lacking.

Enter the Global Pneumococcal Sequencing project (GPS), the brainchild of Professor Stephen Bentley, who leads a team at the Sanger Institute, and Professor Keith Klugman, formerly of Emory University and now Director of Pneumonia at the Bill & Melinda Gates Foundation. Over a pint of beer at a London pub, the pair hatched a plan to establish a genomic surveillance programme that would sequence the genomes of over 20,000 S. pneumoniae isolates from around the world, with the primary aim of understanding pneumococcal evolution in response to vaccine introduction in low- and middle-income countries. The project obtained funding in 2011 and began working with partners in a range of countries, including The Gambia, Malawi and South Africa. By March 2021 the programme had sequenced the genomes of 26,100 pneumococcal isolates from 57 countries. These data are archived together with relevant clinical and epidemiolocal metadata in a public database. This enables analyses, not only of serotype evolutionary dynamics in response to vaccination programmes, but also of clinically relevant characteristics such as antibiotic resistance patterns.

With this huge genomic dataset, the GPS team is able to define S. pneumoniae lineages (strains with a recent common ancestry) within the data set and begin to study how the introduction of vaccines has influenced the serotypes present among these lineages. With a desire to translate genomic data into meaningful findings that can inform public health decisions baked in from the beginning, the GPS team held discussions with partners in many countries to better understand their needs. Key among these were using the genomic dataset to identify major lineages circulating in their country, what changes vaccine introduction has on the pneumococcal population and importantly what is the impact on antimicrobial susceptibility.

“In a microbiology lab, serotyping strains and antimicrobial susceptibility testing are time-consuming and resource intensive. Whole genome sequencing can now reliably infer serotype and antibiotic resistance profiles, build phylogenetic trees to understand where outbreaks might be occurring and track which strains mediate serotype replacement. So it’s one test that can answer a lot of different questions.”

Dr Stephanie Lo,
Senior Scientific Manager / Translational Science Lead for the GPS project and Parasites and Microbes programme member at Sanger.

The GPS dataset is also used to contextualise local findings in the global setting. For example, if a strain or serotype is increasing in prevalence in a country, genomics can show whether this is the expansion of a local outbreak, or the result of multiple introductions into the country from other sources, which could necessitate quite different interventions.

In most settings, the pneumococcal strains that dominate in disease are those that have acquired antibiotic resistance and can survive treatment and better transmit. For this reason, the serotypes targeted in PCV7 and PCV13 were those that were resistant to front line antibiotics. However, for reasons that are not yet entirely understood, children in low- and middle-income countries are colonised by a broader range of serotypes. So while the original PCV7 covered 90 per cent of all strains in the US, it only covered around 50 per cent of strains found in Gavi-eligible countries. PCV13 took this up to 60-70 per cent but there is still a coverage gap. The growing GPS dataset provided an unprecedented opportunity to close this gap, with PCV updates now able to take into account antibiotic resistant pneumococcal strains that dominate in specific countries, and in particular those that have managed to escape the vaccine through serotype switching. At risk of labouring the analogy, genome sequencing is enabling researchers to update and deploy the mallets needed to keep whacking invasive pneumococcal disease as and when it pops up.

“GPS has done a lot of surveillance, which enables us to advise vaccine manufacturers which new serotypes to cover in PCV updates that can be rolled out in low- and middle-income countries at an affordable price and volume.”

Professor Keith Klugman,
Director of Pneumonia at the Bill & Melinda Gates Foundation

Minding the gaps

While GPS created one of the largest and most geographically diverse genomic resources for any human bacterial pathogen, some big gaps remained in the global picture for pneumococcal surveillance. Four countries – India, Pakistan, Nigeria and the Democratic Republic of the Congo – account for half of the global pneumococcal disease burden, yet genome sequencing of isolates from these countries has been limited to date, collectively representing only 5 per cent of sequences in the GPS database. While a variety of economic, technical and political challenges have contributed to this mismatch, expanding genomic surveillance in these places is vital for informing future vaccine formulations to most effectively tackle invasive pneumococcal disease.

Accordingly, in 2019 Bill & Melinda Gates Foundation extended funding for GPS for a further five years. The aims are to close some of these gaps, and move pneumococcal surveillance onto a more sustainable footing, supporting partners in affected countries to expand sequencing and analysis capabilities locally. Bentley, Lo and colleagues work closely with partners to evaluate what is feasible by surveying infrastructure for sequencing locally and supporting bioinformatics training where needed. Cost is a major factor. Most partners now have sequencers in place, but for many the cost of reagents and resources needed to obtain, sequence and analyse samples can be prohibitive. Of course, price is not the sole consideration and the technological capability and public health benefits of developing local genomic infrastructure will compound over time. As such, GPS adopts a mixed model for filling in the gaps, with some partners sequencing locally and others sending samples to Sanger for sequencing for the time being. The hope is that, in the not too distant future, all partners can be supported to develop local sequencing and analysis capabilities.

“The implementation of GPS in India facilitated cross-disciplinary research beyond the original aims of the project. A focused state of art laboratory facility was established for long-term genomic surveillance, networking, and training. WGS-based surveillance has strengthened therapeutic interventions, vaccine strategies, and public health involvement.”

Dr. K.L. Ravikumar,
National Centre for Pneumococcal Vaccine Immunogenicity Evaluation, Kempegowda Institute of Medical Sciences, Bangalore, India

From surveillance to prediction

While genomics is a powerful tool for informing pneumococcal vaccine design, counterintuitively it may not always be beneficial to target a given serotype. There is the possibility that removing a serotype might enable a more invasive one to expand in the niche that becomes available. This is particularly a worry with serotypes that have different disease effects in children and adults. Rolling out a vaccine targeting serotypes found predominantly in children may lead to replacement by serotypes that, while safer in children, cause an increase in invasive disease in vulnerable adult populations. More attention is therefore being given to predicting the consequences of vaccines, with a view to designing them in a way that leads to a more benign post-vaccine population.

Dr Nick Croucher, together with Sanger PaM programme associate faculty members Professor Caroline Colijn, Simon Fraser University, Canada and Professor Jukka Corander, University of Oslo, Norway, integrated genome data from GPS into an ecological model optimised to search for vaccine formulations that considered invasive disease burden after vaccination. Their model suggested that simply targeting the most invasive serotypes may not necessarily result in the greatest reduction in serious disease. Rather, the right formulation for a vaccine was shown likely to depend on which serotypes are circulating in a given population. An optimal vaccination strategy might therefore require vaccines that are individually formulated for different populations and geographic settings.

Unfortunately, PCVs are expensive to develop, produce and roll out. According to Professor Klugman, economies of scale mean that a vaccine formulated for a country/region is likely to cost several-fold more than a single vaccine that is rolled out globally. Furthermore, incorporating additional serotypes that provide greater protection mainly in low- and middle-income countries into updated PCVs has seen little pushback from manufacturers and funders to date, and has not resulted in a significant reduction in vaccine efficacy. However, as noted by Colijn, “We have no idea what the cost is of not developing the appropriate vaccine for the appropriate population. There could be a risk of deploying vaccines that may not work very well, and which undermine trust in the product and vaccination in general, which could impact lots of other infectious diseases worldwide. The people or groups that bear the cost of developing vaccines are not the same as those that bear the cost of not developing them”.

For now it seems, further work is needed to apply genomics to longitudinal carriage surveys of the pneumococcus and to assess the implications of such models in the real world.


While not a goal at the offset, over the past decade the GPS has become a poster-child for applying genomic surveillance for bacterial pathogens at sufficient scale to have real-world public health impact. Looking forward over the next decade, assuming that costs continue to fall and pathogen agnostic sequencing approaches can be developed and applied at scale for clinical samples, efforts to control many other pathogens will benefit from the lessons learned through GPS. In settings without adequate microbiology and antimicrobial susceptibility testing, obtaining antibiotic and vaccine susceptibility data directly from genome sequencing will become standard for tailoring treatment of individual patients. Interrogating genomic data through modelling will also enable a better understanding of pathogen evolution, selection, and responses to antimicrobials, vaccines or non-pharmaceutical interventions, strengthening public health policy overall.

Find out more