I am working in the Tree of Life Programme, where I am looking for genes that correspond to protein complexes in eukaryote genomes.
I’ll be doing this by adding more protist genomes then searching the genomes of organisms (including protists, animals, fungi and plants) to find genes that are equivalent to genes already known to code for proteins of complexes in other organisms (orthologs). I’ll also look at different parts of these proteins and use machine learning to identify other proteins that likely form complexes in organisms yet to be studied. Ultimately, I hope to summarise my findings in a database for other researchers to use.
Genomes hold the genes that are instructions that the cell uses to create proteins. Protein sequences and complexes have similarities across species, and you can see in the evolutionary path how some of them are connected.
Protists are (mostly) single-cell eukaryotes. They are considered ‘higher’ life-forms like us, plants and fungi, compared to non-eukaryotes such as bacteria. Although there is far less variation in size, compared to animals, plants and fungi, the majority of eukaryote genome sequence diversity is within this group. To compare protein sequences across eukaryotes, protist genomes are necessary as well as animal, fungi, and plant genomes.
However, the scientific community is selective over which organisms are targeted for genomes, guided largely by economic relevance and disease. Protist genomes in particular are underrepresented relative to other eukaryote genomes. As part of the Darwin Tree of Life project, we will add more protist genomes, but we will probably add protist genomes from outside of the UK too.
By filling in gaps in our understanding of protein complexes in protists, I am working on predicting which proteins were in the complexes in the ancestor of all eukaryotes.