

Dr Cibele Sotero-Caio is a Genomic Data Curator at the Wellcome Sanger Institute. She looks after the Genomes on a Tree (or GoaT) database, which will hold data for the hundreds of thousand of species that are currently having their genomes sequenced for the first time.
Free and open, GoaT tracks projects around the world, pulling in information about which eukaryotic species are being sequenced, and what their genomes look like. It’s used for both planning and research, and is an increasingly rich source of knowledge about the world’s genomes.
Why did you become a scientist?
I’m from Brazil, and when I was younger I really liked to spend time outside. I would find a tree and I would just scratch it to see what kind of weird animals I could find. I’d try to find eggs and then watch them, follow them, to see what happens, until they hatched. Suddenly, you’re looking at a lizard. I’ve always loved animals.
As I was taking the test to enter University, I was thinking that maybe I should go into language school or marketing. I had no idea that my vocation was actually biology, but I had applied for biology at my local University, as I wasn’t sure about travelling out of my home town to study languages. I got in and thought I should at least start, and I could always switch courses later on.
I remember it like it was yesterday; my first class was about coral, and we looked at coral polyps under the stereoscope. I was just like, okay, this is amazing. If this is what it means to be a biologist, I'm going to be a biologist.
“I remember it like it was yesterday; my first class was about coral, and we looked at coral polyps under the stereoscope. I was just like, okay, this is amazing.”
Why did you become a scientist?
I’m from Brazil, and when I was younger I really liked to spend time outside. I would find a tree and I would just scratch it to see what kind of weird animals I could find. I’d try to find eggs and then watch them, follow them, to see what happens, until they hatched. Suddenly, you’re looking at a lizard. I’ve always loved animals.
As I was taking the test to enter University, I was thinking that maybe I should go into language school or marketing. I had no idea that my vocation was actually biology, but I had applied for biology at my local University, as I wasn’t sure about travelling out of my home town to study languages. I got in and thought I should at least start, and I could always switch courses later on.
I remember it like it was yesterday; my first class was about coral, and we looked at coral polyps under the stereoscope. I was just like, okay, this is amazing. If this is what it means to be a biologist, I'm going to be a biologist.
Tell us about your work
I curate the Genomes on a Tree database, developed here at Sanger, and extensively used by the Earth BioGenome Project (EBP). The database collates genome-relevant information from all eukaryotic species, including those targeted by sequencing projects going on around the world. Together, the groups are aiming to sequence the genomes of all described eukaryotic life, an estimated 1.8 million species.
So we need to coordinate who is doing which species. The database is used to track what is in progress, and what has been collected, and part of my role is reaching out to all these different projects, pulling in their data in a way that people can look for and find the information on their favourite species.
We also collect all data that is relevant to the genomes being sequenced. That includes chromosome number, the sex determination system for a particular species, the genome size, which species have already been sequenced and by which project, and the genome assembly size. This isn’t just published data, but work in progress too. We pull in data from Sanger’s in-house software systems, for example.
The database also infers things, based on the data from the closest available relative for a species. Everything is literally placed on a phylogenetic tree of life. It is constantly updated, so the estimates are getting increasingly accurate as more branches have real data. Users can interrogate the data as well.
For scientists here, the estimates of genome size can be really useful. It helps with planning the resources that will be needed for sequencing each species.
I work closely with the Blaxter lab, especially Rich Challis, the main GoaT developer and Sujai Kumar, who helped build and maintain the database and its connections to other data sources (the APIs). I also help users, teaching them how to use the platform and interact with GoaT stakeholders to discuss new features and feedback.
What is the best part of your job?
In my first role after University, I started as a field biologist, studying the chromosome evolution of bats. So I know about bats, and in terms of chromosomes, they can be kind of boring. I know about beetles too. But I didn’t have a wide perspective on karyotypic diversity across the tree of life.
Looking at all of this new data, all this diversity, and all the different ways that life can do things is what impresses me most.
I'm drawn to biology and try to define patterns - but observing and understanding exceptions is one of the most cool things about being a biologist. Because of my work here, I've seen so many exceptions to things that I thought were true. And that is actually what I enjoy the most: daily data analysis is never boring.
“Looking at all of this new data, all this diversity, and all the different ways that life can do things is what impresses me most.”
What is the best part of your job?
In my first role after University, I started as a field biologist, studying the chromosome evolution of bats. So I know about bats, and in terms of chromosomes, they can be kind of boring. I know about beetles too. But I didn’t have a wide perspective on karyotypic diversity across the tree of life.
Looking at all of this new data, all this diversity, and all the different ways that life can do things is what impresses me most.
I'm drawn to biology and try to define patterns - but observing and understanding exceptions is one of the most cool things about being a biologist. Because of my work here, I've seen so many exceptions to things that I thought were true. And that is actually what I enjoy the most: daily data analysis is never boring.
What are the challenges?
I think it’s that I need to convince people that it is cool to have your data in GoaT as early as possible even if it needs later refining! We’re keen to avoid duplication, and prioritise the work to get the most synergy between all the sequencing projects in the world and sometimes it’s hard to explain that importing whatever preliminary list onto the platform might actually speed up definition of their final target lists
Then, it’s getting everything standardised. This standardisation enables us to accurately display information from multiple sources - I have to understand what data others are showing and convert it to the format we are using. But, with groups all over the world, there are many words for the same thing. I think the scariest part is that I have the responsibility to standardise terms! But we are now creating a scientific advisory board to help define ontology and help with other decisions within GoaT.
Is there a word or phrase that is overused in your team?
Not a word per se, but after a quick pick on the goat-related slack channels I just realised that we say 'yay!' a lot! Every little addition to GoaT is a celebration =)
If you could time travel to anywhere, where would you go?
I have been asked this before and I can never think of specific points in the past (other than concerts of my favourite bands). I'd rather see how research would be done in the future instead. So let's say, right here in 200 years, maybe?
Find out more
- Earth BioGenome Project: https://www.earthbiogenome.org/
- GoaT page: https://goat.genomehubs.org/
- Darwin Tree of Life Project page on GoaT: https://goat.genomehubs.org/projects/DTOL