28th January 2014
By Tom Huckvale
As the cost of DNA sequencing continues to drop, biologists have been drawn towards applying the technology to the more unexplored areas of the natural world, revealing information hidden away in the genomes of even the smallest creatures.
In many of these cases, simply obtaining enough DNA to create a sequencing library, a sample of chopped up DNA fragments that can be read by our Illumina sequencing machines, is impossible or impractical. The organism may be physically too small, or it may be difficult to grow a large enough population to obtain the necessary starting amount of DNA.This is a common problem when working with parasitic worms and single-celled protozoa.
Sequencing a large pool of samples of a species becomes problematic when trying to determine sequence variants between individuals. Recent efforts have focussed on the amplification of very small quantities of DNA from one individual to create a sufficient template to produce a library, using whole-genome amplification.
One of the more commonly used methods is multiple strand displacement amplification. This method uses an enzyme from a bacteriophage, a virus that infects bacteria, to create libraries of DNA from minute amounts of starting material. In our study, we sequenced Caenorhabditis elegans, a free-living nematode found in soil, using multiple-strand displacement amplification and then compared our results with the high-quality reference sequence that already exists for this nematode.
The replication wasn’t perfect. A big challenge in using whole-genome amplification is the appearance of chimeric DNA, pieces of sequence that are in the wrong order, in the finished library. This affects the already highly complex procedure of de novo genome assembly, where the chopped up DNA used for sequencing libraries are computationally pieced back together in the correct order. We believe that the chimeric DNA fragments originate from priming sites at displaced 3’ ends of the DNA. Initially, 5’ ends are displaced by the DNA polymerase extending the random primers, and the resulting single-stranded ‘branches’ are able to re-anneal to downstream fragments resulting in templates containing an inversion either side of a deleted region, shown in the image above.
In close collaboration with colleagues at the Division of Parasitology, in the Department of Infectious Disease at the University of Miyazaki, Japan, we have shown the effect whole-genome amplification has on the Illumina sequencing machine’s reads and our commonly used assembly algorithms.
Knowing when and how errors such as chimeric reads arise, and how to spot them, can give researchers confidence in their subsequent interpretation of the sequence data. Being able to sequence genomes from much smaller amounts of starting DNA makes routine genomic analysis of novel microscopic organisms, single bacterial and human cells and rare historical, forensic and archaeological samples possible.
Tom Huckvale is a member of the Parasite Genomics group at the Wellcome Trust Sanger Institute.
- Tsai IJ, Hunt M, Holroyd N, Huckvale T, Berriman M, Kikuchi T. (2013) Summarizing Specific Profiles in Illumina Sequencing from Whole-Genome Amplified DNA. DNA Research. doi: 10.1093/dnares/dst054