Genomes within Genomes

Hiding among the A’s T’s C’s and G’s of a newly-sequenced red colobus monkey genome, there was a second species. A serendipitous database search led to a collaboration across institutes, and the new genome sequence of a parasite descended from malaria.

Ugandan red colobus (Procolobus tephrosceles), Kihingami Wetland, Uganda. Image credit: Charles J Sharp / CC BY-SA (

Less than one per cent of all complex life has had a genome sequenced to date, but global efforts to sequence every single species on the planet are accelerating. Databases are growing as genomics becomes the next ‘Big Data’ science. 

Such data, freely accessible, is open for others to analyse. In some cases, there is more than first meets the eye.

Serendipitous searching

Dr Theo Sanderson, a postdoctoral researcher at the Crick Institute in London studies the Plasmodium parasite species that cause malaria. He was studying a gene, without a function (a common problem for biologists) and so turned to the first tool of many a computational biologist, and undertook a BLAST search.

He was looking in the databases to see if any other species had a similar gene. He found it, as expected, in related Plasmodium species. But there was also a related gene in an endangered Ugandan red colobus monkey.

Did the monkey have malaria? The genome had been recently sequenced from the blood sample of a wild monkey in Uganda’s Kibale National Park, so it seemed a possibility. However, some reading around showed that Plasmodium species don’t infect these monkeys, though a related parasitic species, Hepatocystis, does. A quick analysis suggested that lurking among the monkey DNA was a substantial amount of sequence from this parasite. Theo wrote up his findings in a blog post and tweeted the researchers who published the monkey genome, led by Dr Noah Simons at Duke University in the US.

The answer to Theo’s tweet was no, and so the story might have ended there, but Dr Adam Reid, Senior Staff Scientist at the Sanger Institute, had spotted Theo’s blog too. He works on parasites from across the tree of life, including malaria-causing Plasmodium, and was keen to find out more about the Hepatocystis data – no Hepatocystis species had been sequenced before. Adam got in touch with Noah, too.

New partnerships

The US team knew that the monkeys had recurrent parasite infections, and that the data they had contained stretches of parasite genome.

Adam and his team got the full raw sequence data from Noah’s group, and set to work putting together the Hepatocystis genome. The data was sufficient to be able to tease out and assemble a largely complete, new sequence.

The team’s research paper with the Hepatocystis sequence is published in the journal PloS Pathogens today.

Malaria relation

Hepatocystis, while similar to the Plasmodium species that cause malaria, has crucial differences.

“After piecing together the genome, we compared the sequences of Hepatocystis and Plasmodium. We could trace their evolutionary history, and confirm that Hepatocystis is descended from Plasmodium,” said Dr Eerik Aunin, Senior Bioinformatician at the Sanger Institute, who led the research.

Unlike Plasmodium, which travels from one host to the next via mosquitos, Hepatocystis is transmitted by biting midges. “We found rapid evolution of genes whose equivalents in Plasmodium have increased activity in mosquito stages of the life cycle. These genes are likely involved in interactions between Hepatocystis and the midges. Many of the genes with unclear function that differ a lot between Hepatocystis and Plasmodium might be important for understanding interaction between malaria parasites and mosquitoes” added Eerik.

“The most interesting thing to me was the difference in their lifecycle,” said Dr Adam Reid, who is senior author of the study. “Plasmodium species cause disease when they rapidly replicate in the blood of their host – killing red blood cells in the process. But Hepatocystis doesn’t have this stage in its life cycle. It replicates in the liver. “We discovered some key genes involved in invasion of red blood cells were absent from our Hepatocystis data, which suggests these genes have been lost as it evolved”.

The replicative stage of Plasmodium, because it causes the symptoms of malaria, is intensely studied by parasitologists. The team hopes that understanding the difference between the two sets of species will enable them to understand more about how malaria causes disease.

“Hepatocystis has a fascinating evolutionary story and is a powerful comparator for understanding malaria parasite biology,” said Adam.

“I think this has been a great example of 21st century research. Open data shared by scientists working in a completely different area, highlighted by a blog post, has ultimately led to important insights into fundamental parasite biology.”

Dr Theo Sanderson, Postdoctoral Associate at the Crick Institute.


Noah Simons, Duke University

Theo Sanderson, Crick Institute

Adam Reid, Sanger Institute

NCBI Blast