Sequencing and the Tree of Life

Eurasian otter (Lutra lutra). Image credit: Karen Miller via AdobeStock

By Ali Cranage, Science Writer at the Wellcome Sanger Institute

All life on earth has something in common – DNA. Yet only 0.3 per cent of the 1.5 million known eukaryotes (plants, animals, fungi and protists) have had their genomes sequenced. Knowledge of the genome sequences of those few species, including humans, has transformed our understanding of biology and evolution. It provides the basis of knowledge to develop new approaches to medicines, agriculture and conservation.

The Earth Biogenome Project, launched in 2018, aims to sequence the genomes of all eukaryotes on the planet.

Professors Harris Lewin, Mark Blaxter and Jenny Graves spoke today at the American Association for the Advancement of Science (AAAS) meeting in Seattle about the promise and progress of the mission.

Biology’s next moonshot

Professors Harris Lewin is the Robert and Rosabel Osborne Endowed Chair and Distinguished Professor of Evolution and Ecology and at the University of California, Davis, and chair of the Earth Biogenome project working group.

He introduced the Earth Biogenome project. “It’s a ‘moonshot’ for biology that has the potential to transform our understanding of life on Earth and address some of the most critical challenges faced by humankind today,” he said.

One of those challenges is that biodiversity is disappearing at unprecedented rates. The living planet index recently reported that more than 52 per cent of the vertebrate population has been lost in the last 40 years. Harris also highlighted 27 per cent of plants and animal species are under threat of extinction.

In 2015, a group of 24 scientists began to think about what it would take to sequence the genomes of all eukaryotes. Their detailed plan was published in 2018, and Harris described progress to date. Over the last 18 months, the project has grown and become a ‘network of networks’ bringing together over 5,300 scientists, 24 affiliated projects, and 30 institutions in 15 countries. All affiliates have committed to making data open and available to the scientific community.

He drew comparisons with the Human Genome Project – completed in 2005. The estimated cost for sequencing all known eukaryotes is predicted to be less than the costs of sequencing the human genome, after accounting for inflation.

The strategy is to sequence a representative species from each known 9,300 taxonomic families in the first 3 years. Phase two will include a representative from each genera – the aim is to sequence all known species in 10 years. Progress is well underway and projects like the Vertebrate Genome Project have published over 100 genomes in the last year.

Another approach is looking at all the species in a geographical area – a ‘deep sequencing’ approach to understand all the species in an ecosystem. That knowledge can be used as a platform to help monitor the effects of climate change or other forces.

Image credit: Alex Cagan @ATJCagan

Harris closed by speaking again about the ‘why’ of the project. “It gives us the possibility to define all of the genetic relationships, evolutionary history, and the origin of eukaryotic life.”

“In agriculture it’s important to know that there are only a dozen crop species that provide 75 per cent of all our food. So being able to sample and sequence the wild relatives of these species could provide new sources of genetic variation for improvement of crops.”  

“It will give us resources for the new bio economy, new bio materials and aid conservation. We want to provide a complete digital library of life on Earth.”

Harris Lewin speaking at the 2018 launch of the Earth Biogenome Project.

Darwin

Professor Mark Blaxter, lead for the Tree of Life programme at the Sanger Institute, spoke next. He opened with a quote from Darwin.

“By sequencing all life on Earth we can look not only at the history of evolution but at evolution that is happening right now,” he said.

He then described the Linnean mission to name every species. Running for 300 years and including thousands of scientists, it means we can use the same words to describe the same thing. It’s a huge open science project. “I think of this as a catalogue of life, a library of life. You have a name for everything. We want to turn this catalogue into something that has content.”

“We decided to start locally…and sequence all the species in Britain and Ireland. We have one of the best known biotas in the world. We have naturalists identifying what’s there, cataloguing it.”

“14,000 years ago, the UK was covered in ice. Now we have ecology – trees animals and plants. We have an example of an ecosystem that reconstructed itself after major climate change. In the genomes of those organisms will be signatures of that response to climate change. We can use them to explore the responses to climate change today.”

To undertake the project, the Sanger Institute has joined with a range of organisations who bring the expertise, skills and resources needed to complete the work, including the Natural History Museum, Royal Botanic Gardens, Kew,  Earlham Institute, CABI, EMBL-EBI, the Marine Biological Association, and many others.

“We’re starting by doing one species in every family. If we do those 4,000, we’ll have the sequence of about 40 per cent of all the families in the world.” The aim is to complete those 4,000 in the next three years.

Image credit: Alex Cagan @ATJCagan

The process

Mark described the process. “We start with a specimen, collected with all the meta-data about where and when it was found, for example. It will be identified by an expert, and we will do a DNA barcode of the specimen.”

Then DNA is extracted, ideally in very long stretches, ready for sequencing. The process results in sheared DNA fragments, and so after sequencing, the raw sequence read data is ‘assembled’ computationally to determine the sequence of entire chromosomes.

The genome is then submitted to public databases through Ensembl.

“We want good quality data because we want to be a foundation for biology into the next century – we don’t want people to have to come back in 10 years and have to do it again,” Mark said.

“It’s a project that is only possible now. Advances in technologies made by Pacific Biosciences, Oxford Nanopore and others give us extremely long, accurate, reads of DNA. We also have long-range data which allow us to put the sequence into chromosomes.”

“The other revolution is in computing,” Mark explained. Computer scientists have developed algorithms to be able to rapidly assemble genome sequence data. “In two months we’ve generated 44 genomes, of really high quality.”

All the data will be published as soon as it is available, via Wellcome Open Research. The next species about to be published is the Eurasian otter, Lutra lutra. “In five years’ time we’ll be publishing about 20 papers a day.”

Fruits from the Tree of life

The last speaker for the session was Professor Jenny Graves. Jenny is VC’s Fellow & Distinguished Professor, Ecology, Environment & Evolution at La Trobe University, Australia. She is also thinker-in-residence at the University of Canberra. She is (in)famous for her prediction that the human Y chromosome is disappearing.

She highlighted some of the things we’ve learned from the genomes of just a handful of species that have been sequenced so far. The ‘model’ species including the mouse, fruit fly, worm and yeast have taught us a lot. They can be manipulated, they have taught us about genes, they can represent other species – but they are ‘atypical’ examples. They have been chosen as models as they grow well in laboratory conditions, but they aren’t always good representatives.

“Model systems are great, but you never know where amazing insights are going to come from,” Jenny said.

She then questioned if we should really sequence some 26,000 species of nematodes. “The world is absolutely alive with nematodes,” she said.

Species across the tree of life have taught us about DNA itself. “DNA was first discovered in salmon sperm. And a lot of functions of DNA were discovered by looking at all kinds of species,” said Jenny.

Tardigrades have incredibly tough DNA, and bats have very efficient DNA repair mechanisms.

“Chromosomes were first noted in salamanders,” Jenny said. She also pointed out the tiny, gene dense microchromosomes, similar in both lancelets and chickens, as being the origins of the larger, repreat-rich vertebrate chromosomes.

Sex

“The sex chromosomes were discovered in the mealworm,” said Jenny, and she went to describe the discoveries of sex determination in mammals from species including the platypus and the mole vole.

But in other species, sex determination is very different. In several species, sex is determined by temperature – alligators are male when it’s hot, marine turtles are male when it’s cold. Some species can switch sex. Sequencing their genomes will help us understand those processes.

Jenny went on to discuss discoveries in telomeres and aging, cells, intelligence, food and argriculture, evolution and conservation. Another stand out example was the hope that marsupials such as kangaroos could be a source of new antibiotics. It was impossible not to be awed by the diversity of life, and the promise of future discoveries.

Could Kangaroo milk be a source of new antibiotics? Image credit: Terri Sharp via Pixabay

Find out more

Earth BioGenome project

Darwin Tree of Life

Jenny Graves

Harris Lewin

Mark Blaxter