All posts by sangerinstitute

From the Wellcome Sanger Institute, a charitably funded genomic research organisation

Sanger Science

Gene responsible for diarrhoeal disease transmission identified

Clostridium difficile [Credit: CDC/Dr. Gilda Jones]

21 June 2012

Written by Laura Deakin

As a Ph.D student at the Wellcome Trust Sanger Institute, I have been investigating how Clostridium difficile bacteria are able to infect people for nearly four years. This bacterium, which is found in hospitals and is rife in the developing world, has been a hot topic of discussion in both the scientific literature and mainstream media in recent years.

C. difficile is the leading cause of antibiotic-associated diarrhoea in developed countries and has been responsible for a number of deaths in hospital patients. The bacterium releases spores that are highly infectious and cannot be killed by standard hospital cleaning routines. As a result C. difficile bacteria are now widespread in many hospitals and they are capable of causing major epidemics that are becoming increasingly frequent and severe.

To understand how the bacteria are able to infect people and transmit from one person to the next, I have been investigating the role of a gene called spo0A. Working with the C. difficile team at the Institute, I infected mice with C. difficile, to allow us to recreate and study many aspects of the disease; including its persistence and transmission in humans.

Using these mice as a model, we are able to mimic the transmission of C. difficile within hospitals and the effects of different techniques employed to minimise its spread. For example, we are able to explore the impact on transmission of patient-to-patient contact and shared rooms, and to study the effectiveness of patient isolation in lowering infection rates.

The study we published online in the journal Infection and Immunity looked at the role the spo0A gene plays in allowing C. difficile to transfer from person to person. We found that the bacterium had to have a normal version of the gene for it to be transmitted. The gene is essential for disease transmission.

Further study revealed that spo0A is also responsible for the persistent nature of C. difficile. This persistence is seen in patients who have been given vancomycin (a powerful antibiotic) to treat the disease. The treated patients recover and return home to an environment that contains C. difficile. The bacteria are then able to reinfect them, resulting in a second wave of disease. Some people can experience multiple episodes of infection over many years. Successful reduction of transmission would greatly reduce the threat of C. difficile as a cause of disease in hospitals.

Our findings suggest that the spo0A gene is a potential target for the development of therapies to disrupt or stop C. difficile transmission. The discovery of this genes role also has clinical implications relating to the management of patients in hospital to minimise transmission: for example by isolating infected patients and by using ‘barrier nursing’ (that is, the wearing gloves, gowns when treating the patients and employing heightened disinfection regimes).

This discovery is just the beginning: now that we’ve identified the importance of spo0A in transmission and persistence, we are now expanding our search to find other, related, genes that may also play a role. Finding these genes will allow us to identify points of intervention that might ultimately be used to contain the bacteria’s spore-mediated transmission and limit the spread of C. difficile.

Laura Deakin is a Ph.D student in the Microbial Pathogenesis team at the Wellcome Trust Sanger Institute… more

Paper: Deakin L et al. Clostridium difficile spo0A gene is a persistence and transmission factor. Infect Immun 2012. doi: 10.1128/IAI.00147-12

Related Links

Credit: Luc Viatour /
Sanger Science

Creating a gold-standard, not a rotten, tomato genome

Credit: Luc Viatour /

Credit: Luc Viatour /

Recently the full reference genome of the tomato (Solanum lycopersicum) was published in Nature (31 May 2012). Here, at the Wellcome Trust Sanger Institute, some of our sequencing people took part in the international collaboration of 10 countries that developed the DNA sequence. Each research group was tasked with working on a different chromosome, and we sequenced Chromosome 4. By being part of the project we were able to share our experiences and knowledge from producing animal reference genomes to enable the plant genome research teams to work together to deliver high-quality, standardised data.

When the tomato genome sequencing project began the teams estimated that the genome was 950 million base (Mb) pairs in size, split across 12 chromosomes. This was no small undertaking: it is one-third the size of the human genome (a project that had taken a worldwide collaboration 10 years to deliver). In addition, the project had limited funding resources, meaning that the work needed to be as tightly focused and efficient as possible.

Fortunately only 25 per cent of the tomato genome contains gene-rich areas, so the project teams agreed that capturing and sequencing these areas only would provide the most valuable information in the most effective way. To achieve this, we used mapping techniques to identify the gene-rich areas and used clone-by-clone sequencing to fully sequence them using the shortest number of sequencing runs.

Clone-by-Clone sequencing

We took clones taken from existing libraries and digested them with restriction enzymes, producing a fingerprint signature for each. We processed these fingerprint signatures in a database known as FPC (Fingerprint Contigs). Sections of signature in common indicate an overlap between clones and these overlaps can often be verified if known markers can be placed in them. By knowing where each clone belonged on the chromosome, we were able to select only a minimal set of clones to cover the area of interest. We made the FPC database for all the chromosomes publically available for the research community.

Fig 1. Screenshot showing the Fingerprint Contigs database. Clones highlighted in red and grey show the minimal tiling path selected for the sequencing project.

Using this approach, we mapped, sequenced and finished the gene-rich clones of Chromosome 4, which was estimated to be roughly 19Mb long. The UK team was led by Principal Investigators Gerard Bishop from Imperial College London, Graham Seymour from Nottingham University, Glenn Bryan from Scottish Crop Research Institute, and Jane Rogers from the Sanger Institute.

Finishing the genome

However, mapping and sequencing are not the whole story when producing a high-quality reference genome: the sequences need to be pieced together and inconsistencies resolved. In other words, the sequences need to be finished. This can be a long and time-consuming process, especially if a project consists of differing standards and approaches. Fortunately, we have long experience in finishing DNA sequencing data from our work on the human, mouse and zebrafish genome projects. So, to enable the other international teams draw on our experience and to develop the common standards needed for efficient finishing, we organised two International Finishing Workshops.

In these, representatives of the different research groups from across the world met and discussed the various challenges of working with the sequencing data. It was a chance to pool experience and look at efficient ways to progress each data set for each of the chromosomes. Our discussions centered around techniques for improving the data for the clones as well as ensuring that the metrics all the teams used to assess the quality of each clone was comparable.

Through meeting together and talking through the issues, the teams ensured that the resulting genomic sequence from all the laboratories involved showed parity. This data was then annotated and made publically available for the wider Solonaceae research community.

Another area that we were able to make a useful contribution to was to guide the project teams through the challenges of adopting and incorporating new technology sequencing data; which the project went on to adopt.

Funding bodies: BBSRC, EU-SOL, DEFRA and the Wellcome Trust

Sanger Life

Fourth Institute bioinformatician wins open access award

A fourth Wellcome Trust Sanger Institute alumnus – Heng Li – has won the eleventh Benjamin Franklin Award for Open Access in the Life Sciences. Remarkably the Institute has trained and developed more than one third of the winners of this award, reinforcing the data-sharing and open-access ethos of the Institute. Even more remarkably, all four winners have been trained in Richard Durbin’s research group: Heng Li (2012), Alex Bateman (2010), Sean Eddy (2007) and Ewan Birney (2005).

Heng Li was chosen from a shortlist of seven open-acess practitioners by voting members of the community.

The Sanger Institute is founded on, and dedicated to, the open-access and sharing of data to power bioinformatic research around the world. However, data sharing without the tools to interpret and interrogate the data is useless. So the Institute is also committed to research that powers the development of, and delivery and sharing of, software to allow genomic data to be compared, mined and studied.

It’s therefore really fitting that Heng Li has been awarded the Benjamin Franklin Award for Open Access in the Life Sciences. His input has created essential tools to enable next-generation sequencing data to be analysed, interpreted and shared. For example, he has helped to produce a range of sequence alignment tools and algorithms including SAMtools, BWA, MAQ and TreeSoft. Using these programs, researchers have been able to read the whole genomes of organisms to find genetic differences between individuals in the same species. For example, research into structural changes in the genome and the genetic basis of human disease based on the 1000 Genomes Project use this software.

In addition, he has developed tools to analyse gene family evolution and build phylogenetic trees, including the TreeBeST program, TreeFam and EnsemblGeneTrees databases. Research using these resources is revealing insights into the evolution of species and the changes happening within them.

Yet such tools are of little value unless researchers are given help and advice in using the software and databases and mining their full potential. Heng has not only contributed to the creation and sharing of a wide range of vital software tools that form an essential resource for bioinformaticians around the world, he is also dedicated to the ideal of sharing knowledge by helping bioinformaticians to understand and use his tools by regularly contributing to bioinformatics forums and guiding new users.

Info on previous winners from Richard Durbin’s group (taken from the Benjamin Franklin Award page on website):

2010 – Alex Bateman
Alex won the 2010 Benjamin Franklin Award for leading the freely available PfamRfam and MEROPS databases. He was also the Executive Editor for the open-access Database issue of the journal Nucleic Acids Research for many years. Furthermore, Bateman helped initiate the RNA Families track at the journal RNA Biology, where a Wikipedia article is required for each published RNA family.

2007 – Sean Eddy
Sean received 2007 Benjamin Franklin Award for the development and free distribution of HMMER, which has revolutionized the use of profile Hidden Markov Models in protein sequence analysis, and for the co-creation of the Pfam database of protein domains and families, which has been an essential counterpart as the basis of genome annotations, family classification systems such as GO, and much of our common language of protein annotation.

2005 – Ewan Birney
Ewan Birney was honoured with the 2005 Benjamin Franklin Award for his promotion of Open Access in bioinformatics and science. He has been a key developer in the Ensembl and BioPerl projects and a strong advocate for making genome information freely available.