17 May 2013
By Ludmil Alexandrov
Cancer, the most common human genetic disease, is caused by changes to the DNA sequence known as somatic mutations. In turn, these somatic mutations are caused by the mutational processes that are operative in every cell of the human body. Some of these processes are intrinsic (e.g., DNA replication errors) and have been present since the very first division of the fertilized egg. Others become sporadically active throughout our lifetimes and depend on the environment (e.g., living in a house built with asbestos) and lifestyle choices (e.g., cigarette smoking, dietary choices, etc.). The constant operation of mutational processes from within our bodies and intermittent exposure to mutagens from our environment results in the gradual accumulation of somatic mutations in the genome of every cell. Cells counteract damage to their genomes by employing a plethora of DNA repair mechanisms. Nevertheless, the molecular battle for keeping the integrity of the genetic code intact is slowly lost by (at least) some cells, resulting in mutations affecting cellular functionality and occasionally in cancerous cells that divide uncontrollably by evading normal cellular constraints.
Different mutational processes can cause different and unique patterns of mutations, which we call “mutational signatures.” Perhaps, the best-known examples of mutational signatures are the patterns induced by exposures to ultraviolet light and tobacco smoking, respectively resulting in changes to DNA bases; cytosine mutating to thymine during UV exposure and cytosine mutating to adenine in tobacco smoke.
Sequencing the genome of a cancer gives a researcher the complete archeological record of the mutational signatures of all processes that have been operative at one point or another during the cellular lineage between the fertilized egg and the cancer cell. The challenge is to take a set of cancers and identify all the mutational processes present in them. We simplified this problem by examining it as an analogous cocktail party problem, where multiple people attending a party are speaking simultaneously while several microphones placed at different locations are recording the conversations. Each microphone captures a combination of all sounds and the problem is to identify the individual conversations from all the recordings. This becomes possible because each microphone captures each conversation with a different intensity depending on the distance between the microphone and the conversation. Similarly, a cancer genome provides only the final mixture of the signatures of all mutational processes operative in a cancer sample, and the goal is to identify these signatures from a set of available mixtures.
This type of a “cocktail party” problem is known as a blind source separation problem and can be solved by applying different mathematical methods. In our study, we used a method called nonnegative matrix factorization. This method is quite powerful as it is able to separate meaningful parts from a large set of data. For example, if one applies nonnegative matrix factorization to passport photos, it will be able to separate eyes, noses, ears, and other meaningful facial features. Similarly, our approach allows separating the biologically meaningful mutational processes present in cancer. The developed computational resource can be leveraged by scientists around the world for characterizing the signatures of mutational processes in cancer samples. We extensively evaluate our framework with simulated and real data, demonstrating that it is applicable to different types of cancer data and that it allows incorporation of a wide variety of different mutation types. Currently, our tool is being applied to genomics data from thousands of patients to reveal the signatures of mutational processes in human cancer.
Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep 3, 246-259, doi:10.1016/j.celrep.2012.12.008 (2013).