14th November 2013
by Vincenza Colonna
Prioritisation of candidate non-coding cancer drivers based on patterns of selection. [DOI: 10.1126/science.1235587]
Why does only two per cent of the genome do interesting stuff while the rest just sits there? This has been a longstanding question in genomics and, until a large number of genomes became available, it was a very difficult one to answer.
I am interested in human evolution and when I joined the Wellcome Trust Sanger Institute in 2010 I was one of those scientists fascinated by the opportunity that the data set from the 1000 Genomes project was offering to answer this question. So, alongside scientists from 34 other institutes, I set about exploring the non-coding regions; the other 98 per cent of the genome.
To do this, we combined information about genome functionality from the ENCODE project with those about human genetic variation discovered in 1,092 individuals in the 1000 Genomes project. By comparing these data sets, we identified both regions that hardly change among individuals because any variation will compromise essential functions and regions where changes are so beneficial that rapidly spread through time in some populations.
Those patterns of variation were well known for coding regions, and only few known cases were relevant for non-coding regions. However, we systematically scanned the non-coding genome and demonstrated that it has functional relevance as well. We showed that some non-coding regions occupy central positions in gene regulatory networks.
All of this has only been possible now that both systematic genomic annotations and global genetic variation information are available for the first time.
The project was an extremely collaborative, involving 48 scientists around the world. For me, it was extremely exciting to be there right at the time when it was happening.
Vincenza is a research scientist at the National Research Council in Italy and a visiting scientist in the Human Evolution team at the Sanger Institute. She works on several projects to understand the processes that lead to the current levels and distribution of genomic variation in humans.
- Khurana E, Fu Y, Colonna V et al (2013) Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics. Science.