4,011 manual interventions - the most complicated genome
The 1,000 genomes have been achieved in part because of step changes in laboratory and computer technologies. In the laboratory, we are able to generate ever more accurate and longer stretches of sequencing read data, which make genome assembly easier - putting back together bigger puzzle pieces rather than smaller ones. While we cannot yet read each chromosome in one go, the accuracy of these new methods (about one error in 10,000 letters read) makes the “assembly” process much easier and more accurate. The assembly process uses new computer tools and hardware that perform at speeds and scale that were dreamed-of but not achievable just five years ago.
Bioinformaticians use several cutting-edge computer tools - often developed by the teams themselves - to ensure that the DNA sequence submitted is correct. These tools are trained to spot and correct a whole range of possible errors, but can be fooled by complicated sections of the genome unique to each species, especially regions that are highly repetitive. For these very difficult regions, an expert human needs to go in and complete the genome by eye, manually positioning bits of sequence correctly in the chromosomes.
The genome that required the most human attention was the Common Toad (Bufo bufo) with over 4,011 interventions needed to build a correct representation of its 5 billion letter-long genome. In contrast, 14 of the first 1,000 genomes required no manual interventions at all!