10th October 2013
by Matthew Mayho
How do you identify the essential bacterial genes that are involved in key processes such as colonisation, infection persistence, or antibiotic resistance? A relatively new technique called TraDIS (transposon directed insertion-site sequencing), has been developed, in part, here at the Wellcome Trust Sanger institute and is now being used by us and at the Department of Veterinary Medicine in Cambridge to quickly find the weak spots in bacteria.
TraDIS helps researchers to find the function of genes by randomly inserting transposon genes into millions of cells and assessing the effects of the disruption on cell viability. If an individual cell dies, it’s clear that the site where the transposon inserted was essential to the cell’s survival. By sequencing the DNA of all the millions of cells that do survive, we get an idea of which genes are important because no data will be recovered from those genes. This technique can then be used across a variety of test conditions, such as testing for antibiotic resistance.
However, making this technique work is no easy task for the R&D team here at the Sanger Institute. Our aim in TraDIS projects is to locate where in the genome the transposon has inserted, which turns out to be more difficult than it might seem because there is only one transposon insertion per bacterial cell. This means that an insertion point would only be encountered roughly once in every 10,000 fragments of sequence, each of which are around 400 letters long! Also, TraDIS doesn’t work quite like normal sequencing, so there are a few technical challenges to overcome.
Only 1 in 10,000 library fragments contain transposon sequence (yellow). Bacterial genomic DNA (black) is attached to special adapters (green and tan). The sequencing primer starts the sequencing process and attaches to the transposon about 10 bases away from genomic DNA.
When we sequence a standard library on the Illumina platform (the main sequencing technology used at the Sanger institute), the DNA clusters on the sequencing flowcell are all different because the fragments forming each cluster are generated as a result of random shearing. However, in TraDIS, because we sequence from within the transposon at a known location, the first 10 bases or so of sequence data from TraDIS libraries is identical among all the clusters. These 10 bases are useful as they verify that the read has been derived from a true transposon integration event but they do lead to difficulties when sequencing using Illumina sequencing machines that do not expect to encounter clusters all reporting the same DNA sequence.
Why should the nature of the DNA sequence across the thousands of clusters affect the workings of the Illumina sequencer? In essence, sequencing on Illumina’s next-generation platform relies on the image taken of millions of DNA clusters fluorescing on a glass slide. Of course, DNA doesn’t normally fluoresce but in this case the basic building block of DNA that is added during each cycle of sequencing is chemically modified so that under laser or LED illumination, it emits light of a different colour (fluoresces). Each of the four DNA building blocks, A, T, C, G, have slightly different chemical modifications and emit different wavelengths of light, which are captured on four separate images. When a standard library is sequenced, each image will contain a similar percentage of clusters that are fluorescing (roughly 25%). However, TraDIS samples do not work in the same way and herein lies the problem!
During the first 10 cycles of TraDIS sequencing the clusters will report fluorescence in one channel only because the Illumina machine needs to sequence through identical transposon sequences. This means cluster density in one of the channels is always high, and the sequencing machine will not be able to distinguish between discreet clusters of DNA when it tries to map out the positions of all the clusters.
In a standard library there are clusters (black spots) present in the image taken for each fluorophore. The density of fluorescing clusters for any particular image is not too great, and importantly, overlapping clusters are likely to contain clusters with different fluorophores, which allows them to be identified as discrete clusters.
A second challenge of TraDIS is intensity normalisation. Each of the four fluorophores used to visualise the different DNA bases have intrinsically different levels of fluorescence which need to be normalised. This is done by comparing intensity levels across the four channels. The problem with TraDIS is that no comparison can be made because three of the four images for any cycle will have effectively zero fluorescence.
There are two other processes taking place that the Illumina sequencer needs to take into account – cross-talk and phasing/prephasing, but I’ll leave an explanation of these to another day! The important thing to remember is that, for the Illumina sequencer, the first few cycles of sequencing are vital for making various key adjustments and calculations (including mapping cluster positions) and that with TraDIS libraries this coincides with the identical sequence of the transposon.
TraDIS-specific recipes begin sequencing when a transposon-specific DNA primer binds to the single strand of DNA. For the first 10 cycles sequencing takes place but there is no imaging. The camera is switched on when bacterial DNA is reached. In the subsequent transposon read, the presence of the transposon sequences are confirmed.
To overcome all these issues, the R&D team at the Sanger Institute, in collaboration with Illumina, have developed a customised sequencing process for Illumina machines where ‘dark’ cycles take the place of imaging for the first 10 bases. Effectively, the sequencing machine skips the transposon sequence and begins to take images when the sequence moves on to normal genetic material from the bacterial sample. This allows the sequencer to efficiently map the clusters and make the necessary calculations needed to interpret the images and generate accurate sequence data. Once the bacterial genome part of the library fragment has been sequenced, the Illumina machine goes back and sequences the 10 bases of transposon in a subsequent set of read cycles. This time round the process runs smoothly because the cluster positions are already known and the normalisation calculations have been made.
My suspicion is that, within a year or two, these sequencing challenges will be solved and we won’t need to use specially customised programmes. Next year, Illumina sequencing technology that uses ordered array flowcells is due to be released and this will enable the analysis software to have advanced knowledge of all possible cluster positions. In theory, this means sequencing of ‘low complexity’ DNA such as TraDIS might become easier. Until then, the Sequencing Research and Development team will have to continue with the challenging task of making Illumina sequencing work for TraDIS.
Matthew Mayho works in the Sequencing R&D team at the Sanger Institute where he helps to develop Next-Generation Sequencing protocols and transitions them to an operational setting.
- Luan, S., Chaudhuri, R.R., Peters, S.E., Mayho, M., Weinert, L.A., Crowther, S.A., Wang, J., Langford, P.R., Rycroft, A., Wren, B.W., Tucker, A.W., Maskell, D.J. (2013) Generation of a Tn5 transposon library in Haemophilus parasuis and analysis by transposon-directed insertion-site sequencing (TraDIS) Veterinary Microbiology,