Using big data to understand small data

30 May 2014
By Alistair Rust

Different genomic landscapes are generated by integrating many large biological datasets. These landscapes proved further evidence as to whether an insertion site may be driving the formation of tumours or simply be an unimportant by-stander.  Credit: DOI: 10.1371/journal.pgen.1004250

Different genomic landscapes are generated by integrating many large biological datasets. These landscapes proved further evidence as to whether an insertion site may be driving the formation of tumours or simply be an unimportant by-stander. Credit: DOI: 10.1371/journal.pgen.1004250

Disrupting genes to understand their roles in cancer is one technique that is used by a number of molecular biology laboratories around the world.

One approach (explained in this blog) is to use short sequences of DNA, transposons or retroviruses, to integrate into the genome and cause genes to malfunction.

However, it can be tricky to know which points of integration are valuable and that’s where large, complementary datasets can be used to help make useful comparisons.

Not all of the places in the genome that these DNA integrating elements hop into cause genes to malfunction. Some of these hotspots on the genome are simply regions in which, for some reason, transposons prefer to cluster. Such regions therefore need to be ignored when trying to identify true, disrupted regions that are driving the formation of tumours.

In a recent study led by collaborators at the Netherlands Cancer Institute in Amsterdam and the Delft University of Technology, also in the Netherlands, a wide range of publicly available epigenomic datasets were combined using statistical methods, to generate landscapes of molecular signals to better understand the patterns of transposons and retroviruses found in cancer.

The generated genomic landscapes are a great resource as they can be used to filter hotspots from transposon-and-retrovirus-based cancer studies. This study is another example of using systems biology approaches to analyse big biological data sets to better understand the complexities of smaller data sets.

Alistair Rust is a Principal Bioinformatician in Dave Adams’ group at the Wellcome Trust Sanger Institute, uncovering cancer genes using mouse models.

References

Related Links: