Sequencing COVID: our latest stats

DNA Pipelines team, Sanger Institute

As the sequencing hub of the COVID-19 Genomics UK Consortium (COG-UK), the Sanger Institute is sequencing the genomes of coronavirus samples from across the country. We are often asked questions about how many genomes we sequence, how quickly, and how the data are being used. Below is a snapshot on some of the key numbers, which is updated weekly*.

As of 19/07/2021, we have sequenced 438,145 coronavirus genomes. The whole of the UK has sequenced 666,250**. All are freely available for analysis via COG-UK, ENA and GISAID. Across the globe, there are a total of 2,401,456 sequences available for public analysis***.

COVID Genomes Sequenced

SARS-CoV-2 genomes sequenced by Sanger
Sanger Institute staff involved
SARS-CoV-2 genomes sequenced by COG-UK

At the Sanger, we have received and handled more than 20 million samples since March 2020 – these samples are the residues of PCR tests, which test for the presence of the virus. Anywhere between 8-20 per cent of samples we receive are positive. Capacity is being increased to be able to sequence 20,000 a week. 300 staff have been involved in the effort at Sanger so far, including teams working on logistics, software, sequencing, research and development, operations and analysis. The operation runs 7 days a week.

Funding for this sequencing work has come from the Sanger Institute, the Department of Health and Social Care and UKRI. Together, these organisations have contributed a combined total of £32 million to the consortium.

The UK is becoming the world’s microscope for COVID-19. With large scale genomic sequencing, we can see how the virus is evolving day by day. We can monitor for new variants and observe them as they move. As vaccines are being rolled out, we are also the world’s binoculars, we can see what is coming over the horizon; how the virus will respond and evolve in response to those vaccines

Professor Sir Mike Stratton, Director of the Sanger Institute

The work builds on the Sanger Institute’s history and capacity for genomic surveillance in other diseases including malaria, cholera and the monitoring of antibiotic resistance in a range of bacteria.

[*Please note these numbers are subject to change. For the latest statistics, please contact the Sanger Institute or COG-UK. **https://majora.covid19.climb.ac.uk/public/dashboard. ***https://www.gisaid.org/]

Genomic surveillance – new mutations and variants

Sequencing of the SARS-CoV-2 virus allows researchers to track its genomic mutations. Mutations occur naturally as the virus replicates in our bodies, though most do not affect its functions. Mutations that affect the virus’s spike protein, which it uses to bind to and enter human cells, are of particular interest.

Sequencing genomes at this volume, across the whole country, means there are comprehensive data that can be used to rapidly asses how the virus is evolving. Scientists use the genomic data alongside other information to asses which mutations may affect the virus’s ability to transmit, cause disease or evade the immune response,  which may come from a previous infection with the virus, or a vaccine.

Visualisation of SARS-CoV-2 genome data in the UK from Microreact

B.1.1.7 was first identified by researchers from COG-UK, Public Health Agencies and the Sanger Institute in late 2020 in South East England. They carried out statistical analysis to show that B.1.1.7 was more transmissible than previous variants, and was largely responsible for the rise in cases at the time. There is now strong evidence that the variant is 50 per cent more transmissible than previous variants.

Find out more about the discovery of B.1.1.7 in this podcast.

Genomic surveillance – vaccine escape

As vaccines are given across the UK, there is a huge question of how the virus will respond. There is a possibility that new mutations and variants will enable it to ‘escape’ vaccines. Sequencing any virus that infects a person who has had the vaccine is a priority for COG-UK. Early detection of any such variants will enable public health authorities, and vaccine manufacturers, to act swiftly.

Genomic surveillance – outbreak tracking

The sequences are being used by researchers to trace outbreaks in hospitals, towns or regions across the UK. Scientists are able to use the genomes as ‘barcodes’ to follow the virus as it moves from person to person. The data help establish routes of transmission – they may be able to rule out or rule in a specific path, or identify a super spreading event.

The sequence data are being passed to public health officials as soon as they are available, and are being used to help inform control measures.

Find out more on the COG-UK website.

The process

The whole process, from samples arriving on site at the Sanger, to the data being uploaded to public databases, takes about 5 days. The sequencing itself takes around 24 hours. Thousands of samples are run in parallel, enabling high throughput.

See our photo essay, or the video below, for details of how we process and sequence coronavirus samples.

The future

Professor Sharon Peacock is Director of COG-UK. She says; “Genomic sequencing data, when combined with epidemiological and clinical information, can make a difference in controlling disease. Thankfully, unlike in past epidemics, we’re able to use this tool quickly and at a huge scale.

“As more mutations accumulate in the virus’s genetic code in different combinations, there is going to be more complexity. This isn’t good news, but at least with the genomic surveillance we have now, we can see what’s coming, and take action – this will include more intensive surveillance, testing and contact tracing in particular areas, and provision of the latest genome information to vaccine manufacturers.”


Find out more

COVID-19 research at the Sanger Institute