27 February 2013
By Zemin Ning
At the Wellcome Trust Sanger Institute we use a range of techniques to add, silence or delete genes so that we can study what they do. We use a specialised piece of DNA known as piggyBac that jumps around the genome of mice, inserting itself into genes and disrupting their function. The system is a very useful experimental tool that allows us to introduce DNA into cells in a controlled way.
In 2010 one of the Institute’s PhD students, Amy Li, approached me to analyse DNA data she had gathered from inserting piggyBac into mouse cells. She worked in Professor Allan Bradley’s team and they were keen to discover the places throughout the mouse genome where the gene was inserting itself. As she explained the study to me, I became fascinated by the action of this ‘jumping gene’.
To enable piggyBac to jump, we use an enzyme. It recognises DNA sequences that are characteristic of piggyBac (known as transposon inverted terminal repeats or ITRs), then “cuts” the gene out and “pastes” it into a new location. To find out exactly what was happening, Allan and Amy were keen to combine next-generation DNA sequencing with bioinformatics analysis to identify the gene’s landing sites.
This was where my team and I came in. I’m a bioinformatician and my role in the project was to lead the reading, mapping and analysing of the DNA sequences generated by the next-generation DNA sequencing machines. Next-generation DNA sequencing is a powerful way to rapidly gather high volumes of genomic information at relatively low cost. It has had a profound impact on our understanding of genetics and genome biology. But as well as generating a wealth of data, this technology also produces a certain level of wrongly-read bases or errors. To cope with this problem, an important part of my job is to identify and remove this type of errors to ensure reliable results.
As Amy explained to me, because of the enzyme’s action, the places where piggyBac lands are not random. In fact we were expecting to find that the gene inserted itself only into regions of the genome with TTAA sequences – the sequence that the enzyme targets. But this was not what we found.
Instead, we discovered that, sometimes, piggyBac lands in places that did not have the TTAA sequence. We discovered this by reading the genomic DNA adjacent to the site of the piggyBac sequence and comparing this with the mouse reference genome sequence to see where the jumping gene had inserted itself.
The work was no small undertaking. To accurately find every place where piggyBac was inserted, we needed to identify the sites in parallel while using rigorous quality controls to ensure that what we found were genuine sites and not artefacts of our investigative processes.
Then Steve Pettitt joined the team to perform statistical analysis to give a better understanding of the mechanism and distribution of these unexpected landing sites. Overall, we found non-TTAA sites at a frequency of 2% in more than 30,000 insertions. In particular, we found that it landed in areas with CTAA/TTAG and ATAA/TTAT sequences. CTAA (1.4%) and ATAA (0.4%) are the major insertion sites.
But the question was: were our findings genuine, or were they due to sequencing errors? We were not sure.
To check the veracity of our research, we made sure that the reference genome sequences at these sites were also non-TTAA. We also examined sequence from both sides of the transposon, and found that the sequence agreed in many cases. Another clue was that all the sites had TA in the centre, with only the two outer bases varying (shown in the figure) – if the changes were due to errors in the sequencing machine or software, they would likely be distributed evenly throughout the four bases. Therefore these unexpected insertions are indeed real.
When the insertions are into a non-TTAA site, this produces mismatches in DNA that are repaired by host-cell mismatch repair pathways. Because the piggybac gene could be carried from place to place with the mismatches we have discovered, it is highly likely that using piggyBac may generate point mutations in the genome. Steve and Amy went on to investigate how these are repaired.
In terms of the paper we published (listed below), that is the end of the story. But it is not the end of our research. My colleague Hannes Ponstingl has used our experience to develop an efficient informatics pipeline to filter and analyse high-throughput DNA sequencing data so that we can continue our research, and use the same approach to analyse the action of another jumping gene – Sleeping Beauty .
Amy now works as a postdoctoral fellow at UC Berkley, USA. Steve Pettit is now based at the Institute of Cancer Research, London.
Zemin Ning is a Senior Scientific Manager and the group of Sequence Assembly & Analysis, he leads, provides bioinformatics supports for various sequencing projects at the Sanger Institute more...
Research paper: Li AM et al. The piggybac transposon displays local and distant reintegration preferences and can cause mutations at non-canonical integration sites. Mol Cell Biol. 2013 Jan 28.
Sequencing at the Sanger Institute: http://www.sanger.ac.uk/resources/technologies/sequencing.html