

Image Credit: AdobeStock
Using genomics to identify the exact strains of bacteria responsible for cholera outbreaks has overturned centuries of thought about the very nature of the disease, highlighting the importance of human-to-human transmission as the major cause of the current pandemic. Integrating genomics into cholera surveillance efforts will be vital to track and ultimately end the outbreak, as well as reduce the threat of future cholera epidemics.
A history of defining diseases
Definitions are vital for tackling infectious diseases effectively. Without understanding the nature of the microbe causing an illness, identifying appropriate treatment measures is essentially blind luck.
For example, long before the microscopic explorations of Dutch scientist, Antonie van Leeuwenhoek led to the first descriptions of microbes (or animalcules as he coined them) in the 1670s, it was generally thought that diseases such as bubonic plague were spread by miasma, a poisonous vapour contaminated with dead matter. Miasma theory was widely accepted in ancient and middle age civilizations in Europe and China, and not entirely without justification given the observational tools available at the time. Infectious diseases were more prevalent in cities and towns contaminated by human and animal waste where the air would have been truly noxious. Sparsely populated places with ‘less miasmatic’ air would naturally have seemed freer from disease, despite the true cause of the illness being invisible before the tools of modern science were available. Even now, in the midst of a pandemic caused by what we know to be an airborne respiratory virus, increasing ventilation to remove contaminated air is acknowledged to be among the most effective measures for preventing transmission - an ancient remedy in modern times.
While the idea that microbes might be responsible for causing a range of diseases slowly percolated through scientific thinking following van Leeuwenhoek’s initial observations, it was not until the mid-1800s that germ theory finally chased miasma out of the window, through the work of John Snow, Louis Pasteur, Robert Koch, Joseph Lister, and their many contemporaries.
Snow, a physician and founding member of the Epidemiological Society of London, which formed in response to the cholera outbreak of 1849, had a clear view that cholera was not a result of miasma but rather caused by ‘cells’ that replicated in the intestine and spread by the faecal-oral route. He also recommended that cholera could be prevented using water that had been filtered and boiled before use. However, it was his epidemiological deductions during the Broad Street cholera outbreak of 1854 that helped to firmly define cholera as a waterborne disease.

Snow-cholera-map-1_wikimedia_commons
Map of the book "On the Mode of Communication of Cholera" by John Snow, originally published in 1854 by C.F. Cheffins, Lith, Southhampton Buildings, London, England.
Working with priest Henry Whitehead, Snow spoke with local residents and built up an understanding of the pattern of cases (that he would later record on a now famous dot map) that enabled him to identify the public water pump on Broad Street as the likely source of the outbreak. This was convincing enough for the local council to disable the pump by removing its handle. Snow then tracked down a cesspool that had leaked into the water table, likely causing the outbreak. Snow’s work enabled cholera to be defined and treated as a water-borne disease, which has saved a great many lives in the years since and still influences cholera control efforts today.
Coming in to focus
As elegant as Snow’s work was, in the modern era, tracking cholera using symptom-based epidemiology doesn’t really cut it. After all, while a rapidly expanding number of diarrhoea cases might be indicative of cholera, such outbreaks could be caused by a range of other microbes. Only by establishing a more detailed definition of the disease-causing bacterium is it possible to accurately determine whether it is cholera, how individual cases in a potential outbreak are linked, and whether outbreaks in different locations are caused by the same strain.
Biochemical, molecular, and more recently genomic, analyses have moved things along since Snow’s time and we now know that Vibrio cholerae is a diverse bacterial species found associated with plankton in rivers and estuaries around the world.
Before genome sequencing became feasible, several different approaches were used to describe the V. cholerae strains and how they relate to each other. Typing of the O antigen, a component of the bacterium’s outer layer, based on antibody binding has been used to define more than 200 serogroups of V. cholerae, yet only cholera-toxin producing O1 serogroup strains have caused cholera pandemics. Another layer of description, a biotype, refers to groups of strains that share the same biochemical and genotypic traits. Within the O1 serogroup are two biotypes, classical and El Tor. Add into the mix levels of description based on other biochemical properties, or the presence of prophage in the bacterial DNA, and the pre-genomic picture of how to describe which strains of V. cholerae were causing outbreaks and epidemics became increasingly complex. This led to a view that cholera epidemics were caused by a diverse set of V. cholerae strains.
Over time, the consensus view emerged that sporadic cholera outbreaks are caused by a wide range of O1 and O139 V. cholerae strains present in local water systems around the world. Known as the cholera paradigm, this accorded with Snow’s view of cholera outbreaks being caused by contaminated water sources. It also led to suggestions that cholera epidemics and pandemics were not only spread by shipping and trade routes but were also influenced by climatic events and that the patterns of outbreaks and epidemics could be understood and predicted on this basis1.
The potential role for the climate in driving epidemic and pandemic cholera was an attractive idea given the often curious patterns of disease recurrence. There have been seven cholera pandemics over the past 200 years, with the first originating in the Ganges Delta in India in 1817. While some places see occasional outbreaks of cholera with no recurrence, others, such as Bangladesh and West Bengal, are hotspots for cholera with annual seasonal outbreaks. By contrast, many African countries are cholera hotspots but with inconsistent patterns and may go five to ten years between major outbreaks2. To better understand the factors influencing patterns of disease outbreak, and what distinguishes one pandemic from the next, a more accurate genomic definition for the causal V. cholerae strains was needed.
Redefining cholera with genomics
Compared to many other diarrhoeal disease-causing bacteria, V. cholerae was relatively understudied. By 2010, only a handful of reference genome sequences had been published. To address this, Ankur Matreja, Dong Wook Kim and Nick Thomson worked with a worldwide network of collaborators to create a collection of clinical isolates of V. cholerae dating back to the beginning of the seventh pandemic in 1961. They then sequenced the genomes of 136 V. cholerae strains, including 113 from their collection of seventh pandemic strains. They compared the genomes to create a family tree showing how all these strains were related.
If the cholera paradigm held true, they would have expected to see a tree that showed clear geographic signals, with multiple major branches corresponding to different regions of the world. Surprisingly, this turned out not to be the case.

cholera_lineages_7th_wave
Family tree of strains of cholera collected during the seventh cholera pandemic. Image credit: Nature 2011; 477: 462-465.
“The tree was extraordinary; it looked more like a virus than an environmental bacterium. It was an asymmetrical tree with very strong temporal and geographic signature. It was one of the most beautiful phylogenetic trees I had ever seen in its complete lack of balance or general evolutionary poise. The tree said that the seventh pandemic resulted from the clonal expansion of a single lineage that has spread globally in distinct waves, essentially the same as we have seen for COVID-19.”
Professor Nick Thomson,
Group leader and head of the Parasites and Microbes Programme, Wellcome Sanger Institute
For a genome with around five million base pairs, there were less than 500 individual mutations spread across all of the clinical isolates sequenced. This went entirely against the idea that cholera epidemics are caused by natural populations of strains from environmental reservoirs, which would have been far more diverse3.
To make sure that they hadn’t missed major branches through biased sampling, and to define the exact strain responsible for as many epidemics as possible, Thomson together with colleagues at the National Collection of Type Cultures, a non-profit culture collection repository in the UK, and collaborators in Africa and Latin America, delved into historic collections to obtain and sequence V. cholerae isolates dating back as far as 1900. They saw that nearly all isolates gathered since 1961 were part of the single seventh pandemic lineage, the only exception being a strain from a contaminated well in Sudan in the late 1960s, which caused around 1,000 cases but showed no evidence of widespread transmission and essentially spread no further.
Importantly, the genomic picture they were drawing closely supported the WHO’s most recent epidemiological data, showing spread of the seventh pandemic from East to West between 1961 and the 1990s. Based on the pattern of genetic changes observed, Thomson, François-Xavier Weill and colleagues were able to demonstrate with precision that the V. cholerae strains from the West African coast seeded the epidemic in Peru in the 1990s4,5.
WHO cares
The implications of this new genomic view of cholera were profound. They showed that natural environmental populations of V. cholerae and epidemic V. cholerae are very different beasts. Environmental V. cholerae causes sporadic outbreaks, with a limited number of cases that spread no further. By contrast, cholera epidemics are caused by individual lineages of V. cholerae that appear to have adapted for human-to-human transmission, and that have spread worldwide on the back of human movement, one lineage for each of the seven pandemics.
“The data were disruptive at every level. We have spent the subsequent decade focused on convincing the public health community that seventh pandemic and perennial environmental V. cholerae strains move very differently. Our message is that you cannot control V. cholerae, which is in waterways everywhere, but you can control the epidemic cholera lineage responsible for the seventh pandemic by focusing on disrupting human-to-human transmission chains.”
Professor François-Xavier Weill,
Institut Pasteur
Discussions were already ongoing about using genomics as the gold standard for defining bacterial strains, and the WHO, which had established the Global Task Force for Cholera Control, were on board with integrating genomic definitions into cholera control efforts. Accordingly, the role of the specific V. cholerae lineage in causing the current pandemic has been recognised and the need for genomic definitions is now being included in local and regional cholera control plans. They are also being included in processes designed to confirm the end of an epidemic and declare cholera-free status, with WHO asking countries to use both molecular and genomic definitions when recording potential outbreaks. Ultimately this will put the onus on countries to collect and sequence samples from every outbreak, although at the moment there is an opt-out given that many cholera endemic countries do not have the necessary infrastructure and genomic capabilities.
Cholera outbreaks have serious economic implications, with people less willing to travel to a cholera endemic country for trade or tourism, meaning that there are strong incentives for countries to declare that they are free of cholera. Furthermore, like all other infectious diseases, cholera cares nothing for national borders. As such, the real goal should be using genomics to enable entire regions to cooperate in tackling pandemic cholera with the aim of being able to declare that their region is free from epidemic cholera caused by the genomically defined seventh pandemic El Tor lineage, or 7PET for short.
Unfortunately, there is currently no unified way of collecting cholera samples from every outbreak to be sequenced, either locally or through a centralised body. This represents a missed opportunity. As has been shown for outbreaks in Bangladesh6 and Pakistan in 2022, sequencing even a handful of samples is enough to determine whether an epidemic strain within the 7PET lineage or an environmental strain is responsible. Establishing a system for sampling of all outbreaks, as well as isolates from sporadic cases between outbreaks would enable researchers to better define the risk of having a large-scale outbreak and importantly delineate the routes by which epidemic cholera spreads.
Having a genomic definition of the V. cholerae strain behind the early cases in an outbreak could also help to ensure that appropriate control measures are directed to where they are most needed to reduce morbidity and mortality. Oral cholera vaccine stockpiles are managed through Gavi, the vaccine alliance, who on request from the government of an affected country can begin distribution of vaccines to at risk populations almost immediately, in concert with humanitarian responses. In 2017 and 2018, nearly 2.2 million doses of inactivated oral cholera vaccine were deployed among Rohingya refugees and the wider population in Bangladesh, with 750,000 people receiving the vaccine within a two week period at one point7. In this case, swift action saw the vaccines rolled out before the strain responsible for the early cases had been characterised, which is fortuitous given that subsequent analysis revealed it to be a high risk 7PET epidemic strain. However, if an environmental strain had been responsible, a vaccination programme may have diverted vaccine stocks and resources away from other outbreaks of epidemic cholera that would not have been naturally self-limiting. This is particularly important given that cholera vaccine production is limited and there are significant challenges ahead in even maintaining current global stockpiles of cholera vaccine8, with the manufacturer of one of the two cholera vaccines used in humanitarian emergencies halting production by the end of 20239.
Professor Munirul Alam, from the International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b) outlines the insights genomic surveillance could reveal: “The Ganges Delta of Bay of Bengal, Bangladesh, is the historic hotspot for global cholera. We used genome sequencing to investigate the 2022 cholera outbreak in Dhaka, Bangladesh, finding evidence for a new subclade of the 7PET lineage designated BD-1.2. Strains within the BD-1.2 subclade caused a massive cholera outbreak, displacing other 7PET strains that were locally dominant at the time providing evidence that cholera is not only transmitted from but also imported to Bangladesh.”
“The genomes of BD-1.2 strains revealed unique mutations in genes that promote growth, resistance to bile salt, cell wall organization, and toxigenicity. The speed with which BD-1.2 came to dominate existing strains in Dhaka is particularly concerning and suggests that this new subclade may cause even more devastating epidemics. Without genomic surveillance, we would be entirely in the dark about the changing nature of the bacterium responsible for such cholera outbreaks.”
Professor Munirul Alam,
International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka, Bangladesh
Eternal vigilance through genomic surveillance
Knowing that each of the seven cholera pandemics over the last 200 years owes to an individual V. cholerae lineage that has become adapted for human-to-human transmission brings with it another major implication. We can, and must, strive to end the ongoing seventh pandemic as soon as possible. However, the potential for cholera outbreaks will remain forever. The ancestors of V. cholerae lineages that might in future adapt to human-to-human transmission and go on to cause the next and subsequent pandemics, are out there in the environment right now.
Using genomics to monitor all cholera outbreaks will allow us to identify lineages that show signs of becoming better adapted for human-to-human spread, and to stop them in their tracks early on before they can start a pandemic. To adapt a phrase that dates from Snow’s era, the price of liberty from pandemic cholera is eternal vigilance through genomic surveillance.
Find out more
References
- Colwell R.R. Science 1996; 274: 2025-2031. DOI: 10.1126/science.274.5295.2025
- Sack D.A. et al. J Infect Dis 2021; 224(s7): S701-S709. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8687066/
- Mutreja A. et al. Nature 2011; 477: 462-465. https://www.nature.com/articles/nature10392
- Weill F-X. et al. Science 2017; 358: 785-789. https://www.science.org/doi/10.1126/science.aad5901
- Domman D. et al. Science 2017; 358: 789-793. https://www.science.org/doi/10.1126/science.aao2136
- Monir M.M. et al. Nature Communications 2023; 14: 1154. https://www.nature.com/articles/s41467-023-36687-7
- Qadri F. et al. The Lancet 2018; 391: 1877-1879. https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(18)30993-0/fulltext
- Holmgren J. Tropical Medicine and Infectious Disease 2021; 6: 64. https://www.mdpi.com/2414-6366/6/2/64
- Davies L. The Guardian 14 October 2022. https://www.theguardian.com/global-development/2022/oct/14/who-dismay-key-oral-cholera-vaccine-shanchol-discontinued-amid-unprecedented-global-outbreaks