The Cancer Genome Project researchers developed a tool to bring all the data on mutations in cancer together - making research faster and easier. Previously they had been logging data in a spreadsheet, but the volume and speed of information that was being published outstripped their ability to keep it up to date. So the Catalogue of Somatic Mutations in Cancer (COSMIC) was born. It captures data on somatic DNA mutations which we accumulate over a lifetime, rather than germline mutations which are inherited from our parents. Two full time expert curators were employed, feeding in data.
At its launch in 2004, the COSMIC database was populated with information on four genes, one of which was BRAF. The other three included at the start were HRAS, KRAS and NRAS – these genes have functions in a cell that interact with BRAF in the same molecular chain of events. Much of the early work was standardising the terms and data formats used, in order to make it possible to compare data between genes, between different types of cancer, as well as from different research groups around the world. After a year, the database had information from 1,700 scientific papers on 1,800 mutations in 21 genes. The numbers have increased ever since.
The type of information included has expanded too, as new techniques and understanding of cancer genomes emerged. Data on fusion genes - hybrids formed of two separate genes - was added in 2007. Data from whole genomes, not just genes, was included in 2009. Copy number variation data, detailing how many repeats of a gene are present, was added in 2013. Data on gene expression – how active a gene is in a cell– was added in 2015. As the nature of research papers changed, purely manual curation was no longer realistic. The team created systems to import whole genome sequence data alongside expert analysis by curators. Ten years after its launch, in 2014, COSMIC had details of mutations in almost all approximately 20,000 human genes.
Finding which of those mutations has a role to play in cancer is essential. As a tumour cell divides, its DNA becomes more and more disordered and mutations accumulate – genome sequence data in cancer is noisy.