6th February 2014
By Moritz Gerstung
The red and blue lines denote the rates of different sequencing errors; at these positions it is difficult to detect a true mutation. The height of the green line measures how often a mutation was observed. Each bubble denotes a patient for whom we have found a variant at the given position.
Over the past few years, the Cancer Genome Project at the Wellcome Trust Sanger Institute and others have found increasing evidence that cells in the same tumour are not all identical. Finding the differences is crucial to diagnosis and treatment but it’s not easy. To overcome the challenge, we’ve copied the tactics of one of nature’s most efficient hunters.
There are often genetic differences between cancer cells, as some cells have already acquired mutations that other cells don’t have. The consequence is that a single tumour can comprise many different genetic diseases at the same time, which makes it even more difficult to combat.
Being able to detect different clones in the same tumour is very important if we are to accurately diagnose a patient, as some of the subclones may be more advanced and require different treatment. Recently, we were able to demonstrate that mutations occurring in only a subset of blood cancer cells influence survival as strongly as if they were already present in every cancer cell.
Unfortunately, it is technically challenging to find mutations present in only a small fraction of cells. The reason for this is that the sequencing technology and bioinformatics tools we use sometimes produce errors that look like spurious mutant cells. Very small error rates easily add up when we analyse millions of sites in the genome.
To overcome these challenges we have developed a new algorithm that is based on two new ideas. Firstly, we run it on a large panel of samples to find out which positions in the genome are more likely to have sequencing errors; at these positions, the algorithm becomes more cautious in reporting a mutation. Secondly, we feed in data from previous sequencing studies, telling the algorithm which sites in a gene are more likely to be mutated. These sites are often very specific and are the places where we should look more carefully for genomic variants.
We have called this algorithm Shearwater after the seabirds. Shearwaters are very efficient hunters as they fly long distances, watching the water to eventually dive in and pick up fish. Often, this is done with the help of whales that push shoals of mackerels close to the surface, making them easier for the birds to catch. The algorithm works in a very similar way, ensuring that we are always hunting for mutations in the right areas.
We have applied this algorithm to a cohort of 683 patients with myeolodysplasia, a blood cancer, and found that in comparison to other tools it detected many additional mutations and correctly discarded many sequencing errors. As the genetics have a large influence on outcome in cancer, this allowed for delivering a more accurate prognosis compared to existing tools. This demonstrates not only the capabilities of the algorithm, but also it’s potential use in clinical sequencing.
Moritz Gerstung is a Postdoctoral Fellow in the Cancer Genome Project at the Wellcome Trust Sanger Institute. With Peter Campbell he works on bioinformatics algorithms for analysing and understanding sequencing data from cancer patients.
- Gerstung M, Papaemmanuil E and Campbell PJ (2014). Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics. doi: 10.1093/bioinformatics/btt750
- Papaemmanuil E, Gerstung M, Malcovati L, et al. (2013). Clinical and biological implications of driver mutations in myelodysplastic syndromes. Blood. doi: 10.1182/blood-2013-08-518886