Massive DNA sequencing reveals disease-causing mutations

19th February 2014
By Yasin Memari

UK10KThe UK10K consortium has collected more than 100 terabytes of raw genetic sequence data from more than 10,000 people since it was launched in 2010 to investigate the role of rare genetic variation in human health and diseases. This huge collection of data has created a valuable resource that will be used in research for years to come.

The project saw a group of 4,000 healthy people from the Twins UK Registry and the Avon Longitudinal Study of Parents and Children and a group of 6,000 patients with recognised diagnoses have their genomes sequenced using next generation whole-genome or whole-exome sequencing technology. Comparing all of this data has helped researchers to reveal genetic links to disease susceptibility and better understand the biology of disease.

I have been involved in the UK10K Cohorts project at the Wellcome Trust Sanger Institute. This project studies the genetic basis of complex traits in the healthy cohorts. Complex traits are characteristics, or phenotypes, that are influenced by variation within multiple genes combined with behavioural and environmental factors.

In particular, our study focuses on phenotypes involved in metabolic, cardiovascular and anthropometric outcomes. These include endpoint diseases common in the UK population such as diabetes, obesity, hypertension, heart disease and poor renal/liver function. By studying the underlying complex traits, the UK10K project is working towards understanding the genetic predisposition to these diseases.

I contribute to the cohorts’ genotype-phenotype association analyses. We examine the sequence data to identify genetic changes that influence the phenotypes using standard Genome-wide Association Studies (GWAS) approaches. In addition, the sequence data enable us to apply gene-based methods, which combine information across multiple variants. We analyse each cohort separately, and subsequently combine the summary statistics to achieve higher sensitivity, in what is known as meta-analysis. The full association analysis involves rigorous statistical analysis, follow-up validation and replication along with a contextualisation of results against the population structure of people in the UK.

By carrying out a thorough analysis, we are trying to maximise the utility of UK10K data in revealing the genetic architecture of common diseases and complex traits.

In addition to its many applications in association mapping and understanding the biology of disease, the data will be a great resource for the study of genetic diversity and demographic history of the UK population. Even though the native UK population is mainly homogenous, there have been several migration events that have shaped the genetic ancestry of the population. The UK10K data may be used to uncover population substructure in the UK, which could be informative about past demographic histories.

Yasin Memari is a Postdoctoral Fellow who works in the Genome Informatics group at the Wellcome Trust Sanger Institute. He is a physicist by training who now works in bioinformatics and statistical analysis of human genome sequencing projects.

Related Links: