Tell us about your work
I curate the Genomes on a Tree database, developed here at Sanger, and extensively used by the Earth BioGenome Project (EBP). The database collates genome-relevant information from all eukaryotic species, including those targeted by sequencing projects going on around the world. Together, the groups are aiming to sequence the genomes of all described eukaryotic life, an estimated 1.8 million species.
So we need to coordinate who is doing which species. The database is used to track what is in progress, and what has been collected, and part of my role is reaching out to all these different projects, pulling in their data in a way that people can look for and find the information on their favourite species.
We also collect all data that is relevant to the genomes being sequenced. That includes chromosome number, the sex determination system for a particular species, the genome size, which species have already been sequenced and by which project, and the genome assembly size. This isn’t just published data, but work in progress too. We pull in data from Sanger’s in-house software systems, for example.
The database also infers things, based on the data from the closest available relative for a species. Everything is literally placed on a phylogenetic tree of life. It is constantly updated, so the estimates are getting increasingly accurate as more branches have real data. Users can interrogate the data as well.
For scientists here, the estimates of genome size can be really useful. It helps with planning the resources that will be needed for sequencing each species.
I work closely with the Blaxter lab, especially Rich Challis, the main GoaT developer and Sujai Kumar, who helped build and maintain the database and its connections to other data sources (the APIs). I also help users, teaching them how to use the platform and interact with GoaT stakeholders to discuss new features and feedback.