To date, well over 200 Darwin Tree of Life genomes have been assembled and curated by the dedicated teams of bioinformaticians at Sanger – the vast majority of them in 2021.
The Tree of Life Assembly (ToLA) team and Genome Reference Informatics Team (GRIT) are responsible for turning masses of DNA data – those A, C, G and T base pairs – into beautifully assembled and curated genomes. Crucially, these are arranged in chromosomes, to reflect biological reality as closely as possible, before being released to the scientific community.
“We have received triple the number of curation requests compared to 2020,” explains Jo Wood, who heads GRIT. “As we do this, to meet the ambitious goals of the project, we are constantly looking for ways to increase throughput and improve turnaround whilst maintaining quality. A huge amount of effort has gone into streamlining, automating, and generally reducing the human hands-on time required.”
A "before and after" of a genome being curated by the Genome Reference Informatics Team using PretextView - notice how the "shrapnel" at the edges of the first picture has been placed in the correct sequence in the central diagonal (Image: Alan Tracey, Wellcome Sanger Institute)
The two teams work closely together to make sure the final genomes are of the highest possible quality. For example, this year GRIT spotted some missing data when curating an apple genome. The ToLA team then investigated and uncovered a couple of bugs in the programs.
“Our pipeline is working on a huge variety of different organisms, such as plants, worms and lepidoptera,” says Marcela Uliano-Silva, a senior bioinformatician on the ToLA team who has this year also written a tool for assembling mitochondrial DNA from Pacbio HiFi reads. “To put into perspective how far the science has come: two decades ago the human genome was published, having taken 13 years, almost $3 billion and nearly 3,000 scientists. We’re now producing several new genomes per week, in much higher quality and at the chromosomal level.”
One of the genomes the Sanger team worked on was that of the super-stretchy ribbon worm, Lineus longissimus – which, at full length, is the longest animal in these islands. Its genome, however, was just an eighth the length of the human genome. Compare that to the mistletoe genome which is 30 times larger than the human genome – and is one of the trickier genomes we expect to sequence and publish in 2022.
L. longissimus specimen collected by DToL at FSC Millport, next to a plot of its mitochondrial genome - assembled using the MitoHiFi tool (Images: Mark Blaxter & Marcela Uliano-Silva, Wellcome Sanger Institute)