‘It had better be good’

The output from an automated DNA sequencing machine used by the Human Genome Project to determine the complete human DNA sequence. This sequence is part of human chromosome 1.

By Alison Cranage, Science Writer at the Wellcome Sanger Institute

The words spoken by Fred Sanger to John Sulston, upon hearing that the Sanger Institute was to be named after him, were: ‘it had better be good’.

The Sanger Institute was founded in 1992, with Sulston as director, to uncover the code of life – the human genome. The mission included hundreds of scientists and many organisations around the globe. The international partnership raced to determine the human genome sequence ahead of private efforts. Their goal was to make the knowledge open and available for all.

Huge amounts of effort, time and money were invested world-wide. Yet some called the value of the mission into question. It was a fishing expedition, and the catch was far from certain.

Now, 27 years later, we hope we have done Fred Sanger proud. The Sanger Institute became the largest single contributor to the human genome project. Determining the seemingly simple order of four letters of DNA has transformed science in so many ways, changing how we understand life on Earth.

Today marks the 20th anniversary of the publication of the first draft human genome sequence – a historic moment in biology. Here we reflect on what followed for the Sanger Institute, and what the future holds.

John Sulston and Michael Morgan celebrate the publication of the draft human genome sequence

From one to everyone

The first draft of the human genome sequence was completed in June 2000, to much celebration. Yet, as is ever the case in science, the sequence itself provided more questions than answers. Analysis showed that the vast majority of the sequence doesn’t code for the proteins that build and run our cells and our bodies. What is the rest of our DNA doing? Is it a remnant of our evolutionary past or an important part of the code for a human? New questions arose about how our genomes function, and about how they vary from one person to the next. Arguably the most important questions were, and remain to be, about how our genomes are linked to health and disease.

Directly following the human genome project, research programmes were set up alongside Sanger’s sequencing facility. Scientists established and led larger projects, each built on the one before. For example, the HapMap project then the CNV project, the 1,000 genomes project, and UK10K sought to uncover more and more about human genomic variation. Other projects went on to investigate the origins of humankind, studying ancient DNA and piecing together the journeys our ancestors made around the world.

Just a few years after the draft human genome sequence was published, a project to put the knowledge to use in the clinic was started. Working with colleagues in the NHS, Sanger Institute researchers established DECIPHER, to help children with undiagnosed rare genetic conditions. This, and the subsequent Deciphering Developmental Disorders study, have led to diagnoses for families, improved healthcare and the discovery of new genetic conditions. They were also the foundations of the government-led 100,000 Genomes Project and subsequent NHS genomics service that launched in 2018 – enabling whole genome sequencing for all rare disease patients who might benefit.

Now, we sequence DNA at a rate equivalent to a human genome every 3.5 minutes. We’ve sequenced over 10 petabases of DNA since 1992. The first five petabases took 25 years, the next five, just 13 months.

The latest acceleration is down to one project – UK Biobank. We are working to sequence 250,000 of the 500,000 human genomes from volunteers in the project. Linking participants’ DNA sequence data to detailed information about their health and lifestyle means researchers can understand more than ever before about the connections between genomes and health, including heart diseases, stroke, diabetes, arthritis, osteoporosis and depression. The huge numbers mean there is enough statistical power to identify those connections, even when they are rare, or their effects are small. 

Cancer

A disease caused by a disordered and dysregulated genome, cancer has been a focus of Sanger scientists since the completion of human genome project. Institute researchers were pioneers in the sequencing of tumour genomes, founding global projects to unravel the genetics of cancer. Thousands of tumours have now been sequenced, and their characteristics catalogued.

Sanger scientists have discovered genes that when altered, can cause cancer cells to grow, or can suppress cancer growth. Their work has led clinical trials and effective new cancer treatments. In 2004, researchers started a database, known as COSMIC, to record all of the genetic changes in all types of cancer. This team now curates the world’s knowledge of cancer genetics.

Other teams are working to characterise cancer cell lines and tumour organoids – 3D balls of cells grown in the laboratory that represent cancer in people. They use these cancer models to screen drugs, which power the search for new precision cancer therapies. Researchers have also uncovered patterns of DNA change across the genomes of cancer cells to identify tell-tale signatures, which can indicate the causes of cancer.

The common thread

“One should not underestimate how important this event is in human history. Over the decades and centuries to come this sequence will inform all of medicine, all of biology, and will lead us to a total understanding of not only human beings but all of life. Life is a unity, and by understanding one part you understand another.”

Professor John Sulston, Director of the Sanger Institute speaking about the publication on the human genome in 2000

All life on Earth is connected by the common thread of DNA. Every single living thing is built from a different arrangement of the same four letters of code.

Alongside the human genome, the Institute led the first sequencing of a eukaryote genome (yeast), an animal (nematode worm) and contributed to the sequencing of important species for scientific study including the mouse and zebrafish. Sanger Institute researchers have also led the world in sequencing microorganisms that cause the world’s deadliest infectious diseases. These include the agents of malaria, MRSA, cholera and pneumonia as well as neglected tropical diseases such as schistosomiasis and Guinea worm. The aim is to understand these organisms and how they infect and interact with their human hosts.

Sanger scientists have also established global projects to undertake genomic surveillance of important pathogens. Teams track the spread of pathogens, their evolution and their growing resistance to drugs, including antibiotics. There is emphasis on building capacity in low- and middle-income countries, so that genome sequencing and analysis can be done locally.

Genomic surveillance of countries, regions and households is enabling public health authorities to target interventions and save lives. In 2012 a team stopped an MRSA outbreak in a neonatal intensive care ward in a Cambridge hospital. More recently, Institute scientists, together with researchers in South East Asia, used genomics to show that resistance to antimalarials was rapidly developing in the region. The data is being used to inform prescribing policies.

Now, in the global pandemic, researchers are sequencing genomes of the SARS-CoV-2 coronavirus from UK patients with COVID-19, with the same aims – tracing transmission and informing public health initiatives.

Faster, smaller

Whilst the speed and capacity for genome sequencing have rapidly increased, the amount of material required for sequencing has been minimised – from the microscopic to the nanoscopic. Researchers are now able to sequence DNA or RNA from just a single cell.

This powerful technique allows the study of an individual cell’s activity, by measuring which parts of the genome it is using at any one point in time. High throughput machines mean hundreds of thousands of cells can be analysed at once, and advanced methods mean it is possible to do this in-situ – locating cells, and their activity, within a tissue or organ.

The Human Cell Atlas, co-founded by the Sanger Institute, is capitalising on these advances. Launched in 2016, scientists are sequencing 30-100 million single cells from the human body – out of a total of roughly 37 trillion. The aim is to create a comprehensive, 3D reference map of all human cell types. This will lead to a deeper understanding of cells as the building blocks of life. It will form a new basis for understanding human health and diagnosing, monitoring, and treating disease. The initiative has already led to discoveries of new cell types in the immune system and the lungs, with important implications for disease.

All life

One of the newest research programmes at the Sanger Institute is part of another global mission. The goal is to sequence all complex life on the planet – 1.5 million known eukaryotic species. Less than one per cent has had a genome sequenced so far.

Detractors have asked the same questions of this project as the human genome project; why do it, what will you find? And perhaps we don’t yet know, exactly – but that is the point.

We do know that the genome sequences will help with conservation and biodiversity studies. They will enable researchers to delve deep into evolution, understanding better how we came to be. The genome sequences will also enhance research into medicine, new biofuels and agriculture. There are a wealth of creatures and molecules waiting to be discovered. The project will push biology once again into the unknown.

The future

The first human genome took 10 years to reach a working draft, and another decade to finish, at a cost of $3 billion. When the project began, no-one knew where it would lead. Now, we sequence DNA at a rate equivalent to a human genome every 3.5 minutes, and at a tiny fraction of the cost of the first human genome.

Sequencing humans continues at such a pace that the numbers of people with their whole genome sequenced are likely to soon be in the millions. Genomic scientists grapple with petabytes of data. As sequencing becomes routine, there are important issues to consider. We inherit our genomes from our parents, and so the knowledge of our own sequence could have impacts for our families too. As individuals and as a society, are we ready? Some of our researchers are seeking to find out, and explore the social, legal and ethical issues raised by genomics.

Yet, while a genome sequence in itself can tell us a lot, it isn’t everything. Our genome is regulated in a myriad of ways. It is influenced by our environment and ever-changing. There are many complexities yet to unravel.

Some things haven’t changed since 2000. The principles of openness that underpinned the human genome project are still core to the Sanger Institute. All of our data, projects, tools and discoveries are open for others to use and build upon.

The project showed that international teams could work in close cooperation (or friendly rivalry) to produce a public good that would benefit all of humanity. This spirit of openness, collaboration and common good has now permeated biology, where principles of open access and open data have transformed and enhanced understanding.”

Professor Mark Blaxter, Tree of Life Programme Lead at the Sanger Institute.

“The Human Genome Project is now viewed as one of the most significant endeavours in the history of science, providing new foundations for the study of human biology in health and disease. It revealed the full set of components of the molecular machinery of the human body, providing a basis for understanding how they work together, paved the way to greater insight into human evolution, and dramatically accelerated understanding of the full range of human diseases, leading to new diagnostics and new treatments. It also changed the nature of biological science by generating huge amounts of data to describe the approximately 3,000,000,000 base pairs of DNA code of the human genome, fostering a new breed of biologists with computational skills and the requirement for ever greater computer power to store and analyse human genomes.

“Finally, and of transformative importance for biological science generally, it introduced a new scientific culture in which massive ambition is achieved through global collaboration of large groups of scientists and in which open sharing of data, to be used by all for the benefit of all, is at the centre.”

Professor Sir Mike Stratton, Director of the Sanger Institute.

John Sulston was right. Life is unity. And science, open and available for all, is more important than ever.


Further reading

Related projects