The MBA has also optimised DNA extraction and PCR protocols for many different species of seaweed. To date, they have collected 34 common species. They are also starting to collect protists, very simple eukaryotic organisms that are not considered animals, plants or fungi. Sixteen protist strains are currently being cultivated, while nine have been harvested for DNA extraction.
“Barcoding protocols are currently being developed at MBA by Helen Jenkins and Joanna Harley, and a wider conversation about cross-institutional protocols is occurring with the DToL project collaborators,” says Nova Mieszkowska, MBA Research Fellow. “The methods at MBA aim to firstly confirm identification to species level where possible, and secondly provide ‘deep’ phylogenetic information by methods such as building multigene trees.”
Data collection on the go
The Natural History Museum
In spite of the pandemic, the Natural History Museum (NHM) DToL team has had many highlights this year including the successful development of a sample collection-to-barcode pipeline. The sampling team has completed the arthropod species list and once lockdown was lifted, fieldwork trips took place. The team also undertook ad hoc collecting locally when possible. A total of 1034 samples have been collected and are now stored in the NHM Molecular Collection Facility.
The data management team worked hard to get a sample data pipeline in place, setting up the epicollect mobile app for in-field sample data entry. This app helps to ensure that sample data can be exported to the DToL sample tracking system (based on COPO) and stored on the NHM collections management system.
A barcoding pipeline was put in place and collected samples were successfully sequenced, barcodes were validated against the Barcode of Life Data system (BOLD database) and the analysed data were then sent over to the Sanger. The NHM team is now fully trained to use their new Pacific Biosciences Sequel machine, and they will be validating this system to increase barcoding throughput going forward.
COPO: A big data broker for the DToL
“COPO is something quite special and unique that the science community has long been missing,” says Dr Seanna McTaggart, the Earlham Institute’s DToL Programme Manager. “For too long, data have been locked away in lab notebooks, or in files on a computer.”
COPO - Collaborative Open Omics - changes that.
COPO is a big data broker for life science. Developed by the Davey Group at Earlham, COPO takes care of uploading the metadata that are essential for contextualising genomic data. It’s as simple as uploading a spreadsheet, and COPO then does the rest, making sure that data are referred to the correct public repository. In the case of DToL, that is EMBL-EBI’s European Nucleotide Archive (ENA).
Green algae colonies from an agar plate. Credit: Sally Warring
“COPO ensures that metadata are validated,” said Earlham Institute Research Software Engineer, Alice Minotto in a recent interview. “This could be metadata such as taxonomy, which can be tricky as identifying organisms is not a fixed process. Names and species identification can change over time, and even within specific communities.
“Instead of having to check and submit this information manually, which would take a very long time, COPO automates the process. This makes it far less time consuming, easier, and eliminates errors.”
To find out more about COPO, contact Dr Felix Shaw and Alice Minotto via the COPO website.
Large-scale sampling and tricky, slimy species
Wellcome Sanger Institute
It has been a tumultuous year for Sanger’s DToL team as they started to set up large-scale DNA sampling and sequencing pipelines from scratch, only for coronavirus to shut down scientific operations for several months. Caroline Howard, Scientific Manager for Sanger’s Tree of Life Programme, says the team has done an outstanding job.
“I think one of our biggest achievements has to be that we’re now properly up and running, despite the disruption of coronavirus. The support from our colleagues in sequencing operations has been amazing, particularly Elizabeth Cook, Craig Corton, Karen Oliver and Mike Quail.”
Sanger now has a fully-functioning tracking system where samples from the same specimen are submitted for the various sequencing techniques required, at a rate of 20-30 species per week. People may think extracting and sequencing DNA is the same for all families and species, but in fact different taxa pose different challenges that have to be solved each time.
The blue-rayed limpet (Patella pellucida) was one of the species sequenced using Sanger’s new DNA pipeline. Credit: Mark Blaxter
“We’ve had a lot of success processing butterfly and moth samples this year, but slimy species such as molluscs continue to be tricky. But we’ve come a long way. A great example of how far our pipelines have come is Patella pellucida, the blue-rayed limpet. This sample was collected by Sanger faculty at Millport, Scotland at the end of August. Within five weeks, it had been received in the lab, gone through sample management, validated using COPO, put through our protocols for DNA extraction and sub-sampling, and submitted for sequencing.”
“We’re now assembling all of the data to reference genome standard. I think this represents an impressive turnaround time from collection to reference genome, and stands us in good stead to scale up in the year ahead.”
At the end of the year, the Sanger teams celebrated the formal release of the first 30 DToL species’ genome sequences to the European Nucleotide Archive. These assemblies are of uniformly high quality, with all the sequences assigned to chromosomes. Hundreds more are now in the sequencing, assembly and curation pipeline.
Illuminating nature’s dark matter: Protists and single cell genomics
Earlham Institute and University of Oxford
Protists make up the overwhelming majority of eukaryotic life but until now have remained relatively understudied. Researchers in the Hall group at the Earlham Institute and the Tom Richards lab at the University of Oxford are changing that, aiming to sample and decode the breadth of protist diversity across the British Isles.
That’s no easy task. ‘Protist’ is a word that describes a staggering range of lifeforms, some with genomes as small as a bacterium while others boast far greater complexity than that of the human genome. At Earlham, Dr Sally Warring has been working with the Single Cell Genomics team to coax the genetic information from this mysterious myriad of lifeforms.
Sally Warring out in the field. Credit: Earlham Institute
“Protists are so variable,” Warring explained in a recent interview. “Some have thick cell walls, some have glass cell walls, some have silica scales on them, some have starch – all these different things going on with their cell chemistry. This all makes DNA extraction, or the ability of an enzyme to work, highly variable.
“What I’m doing now is culturing protists to use Hi-C [a chromosome capturing mechanism], which looks at the proximity of DNA sequences to each other to get a better idea about the structure of genomic sequences. We’re trying to establish this in our single cell pipeline, possibly from metagenomic samples, to get better single cell genomes.”
Rapid access to the DToL genomes
EMBL’s European Bioinformatics Institute (EMBL-EBI)
One important goal of the DToL project is to make all of the newly-sequenced genomes fully accessible to all researchers. Every genome sequence from the DToL project will be freely available through EMBL’s European Bioinformatics Institute’s (EMBL-EBI) database, the European Nucleotide Archive (ENA). Each of the genome sequences collected will also be annotated, stored and made available through the Ensembl genome browser. Both the ENA and Ensembl have made significant changes to their underlying processes to be as efficient as possible and keep up with the enormous scale of the DToL project.
These changes, driven by a need for rapid access to genome annotations at scale, led to the launch of Ensembl Rapid Release. Rapid Release is a lightweight, scalable version of the Ensembl genome browser designed to house annotations for species from DToL and other sequencing efforts.
Unlike the main Ensembl website, which updates every three months, Rapid Release is updated every two weeks with new species and annotations. As a result, downstream research can begin within weeks of the annotation being finalised – a huge benefit to the DToL project and beyond as the number of genomes begins to ramp up.
“Five months after the launch of Ensembl Rapid Release, we already have over 170 genomes from DToL and other projects,” says Fergal Martin, Vertebrate Annotation Coordinator at EMBL-EBI. “As we get more genomic and transcriptomic data from DToL we can now roll out the annotations on Rapid Release.”
These are some of the amazing achievements made by the DToL project this year and this is just the beginning. Thousands of new genomes will be sequenced in the coming years as the DToL project gears up to sequence entire ecosystems.
As the DToL project expands to collect and sequence more species, researchers can expect to see more new genomes released and made freely accessible. In the near future, the DToL project will also provide a great opportunity to bring people closer to nature and give us a better understanding of how we can protect our planet.