Tag: sequencing

Sanger Science

Solving the mysteries of developmental disorders: The DDD study

By: Alison Cranage
Date: 16.10.18

Alix and Pip enjoying a day out. The genetic change responsible for their rare disease - DDX3X - was identified by the Deciphering Developmental Disorders team who read their DNA codes (and those of their parents) and compared them with thousands of other children and parents. Photograph used with kind permission of Clare Millington

Alix and Pip enjoying a day out. Photograph used with kind permission of Clare Millington

A rare disease is one that affects less than 0.5 per cent of the population. There are about 7,000 known rare diseases and on average, five new ones are described in the literature every week. Most are genetic, caused by mutations in a person’s DNA.

Some of the conditions may be familiar, like Huntingdon’s disease or cystic fibrosis, which affect thousands in the UK. Others affect just a handful of people in the world.

Clare Millington is mum to identical twins Pip and Alix, who have an extremely rare condition. They have difficulties with communication, sensory integration and movement. Alix also has type-1 diabetes. When they were younger, most of the girls’ hospital appointments were focussed on helping them with their symptoms.

Clare talked to us about her experiences.

“I was a little uncertain to start with when people started talking about learning difficulties whether that was a diagnosis in itself. It wasn’t really clear that wasn’t a diagnosis, it was just what manifested.”

The girls were 10 when their paediatrician in Newcastle suggested they join the Deciphering Developmental Disorders (DDD) study – to see if a genetic cause of their difficulties could be identified. They signed up and saliva samples were sent away for DNA analysis. Clare was busy, working and looking after the twins, and didn’t think more about the study until a letter came through the post four years later.

“We were told they had a DDX3X mutation and were handed a learned paper, which had been published just a month before. It was quite amazing to think they had found what had caused most of the girls’ difficulties.”

The diagnosis has had a huge impact on Clare’s family. They contacted the newly formed DDX3X foundation in America, and have connected with other families with the condition in the UK. Clare described how meeting others has been an amazingly positive experience.

“Previously we were on the sidelines. We’ve belonged to some cerebral palsy groups but we don’t quite fit. We’ve belonged to some communication aid users groups, but again, we’re the odd ones out. We were always welcomed but they weren’t like the other children.

“To have a support group of parents that totally identify with you is huge. You may not see each other or talk to each other often, but you do belong.”

There are now a total of just 250 girls in the world identified as having DDX3X Syndrome, though researchers think there are likely to be more. It’s clear that the condition is a spectrum, with some much more severely affected than others. But there are similarities between the girls too – and knowing about those can help with treatments.

“A lot of the girls have cortical visual impairment – their visual acuity is good, but actually the processing of the visual information is so poor that they are visually impaired.

“That’s very hard to pick up in a child with learning difficulties, because doing a sight test is tough. But once you know that it’s a common part of the condition, you can look for it. And think, maybe that’s the root reason they’re not learning to read or recognise shapes.”

Another common issue is constipation, which affected the twins too. At first, their inability to get potty trained was put down to their learning difficulties. But once people realised that there may be an underlying structural problem they looked further. It was found they have virtually no gut transit – this means the muscles in the gut aren’t working to move food along. Their treatment was changed.

“Knowing this helps them have a healthy life. Now we’re getting a treatment that actually works – because we know what the problem is. There is more research all the time, leading towards ways of getting good treatments.”

“I really think that the twins would be very much side-lined without their diagnosis. I really don’t think we would have moved forwards on many of the issues that they have without a diagnosis.”

Diagnosing thousands

In Numbers: Eight years of the Deciphering Developmental Disorders (DDD) Project

In Numbers: Eight years of the Deciphering Developmental Disorders (DDD) Project

Pip and Alix are two of over 13,600 children who joined the DDD study, which started eight years ago. Researchers at the Sanger Institute have been working with NHS clinical genetics services across the UK to identify, catalogue and analyse gene changes that may be responsible for a whole range of developmental disorders. None of the participants had a diagnosis before they joined the study.

Now, all the participants’ data have been analysed and reports are being passed back to their clinical geneticists. The DDD team have found diagnoses for about a third of the children. They are committed to re-analysing the data to try to find a diagnosis for as many of them as possible.

Dr Helen Firth, Consultant Clinical Geneticist at Cambridge University Hospitals Trust, and Honorary Faculty at the Sanger Institute, is one of the founders of the study. She described how new knowledge means they can continue to make new diagnoses.

“We’ve been re-analysing at regular intervals through the project. On our first thousand patients we achieved a diagnostic rate of 27 per cent. Three years later when we re-analysed the same data, with new knowledge, we can lift that diagnosis rate to 40 per cent. That’s based on genes we’ve discovered, and based on genes others have discovered and putting those into the mix. Plus, there are new tweaks to the pipeline – things that we’ve learnt to improve; the way we are filtering the data.”

The team expect the pattern to continue as the project runs until 2021.

Navigating the new ethical questions

The DDD ethics team used an online survey to gather people's views from around the world

The DDD ethics team used an online survey to gather people’s views from around the world

The study was one of the first of its kind in the world, looking at whole exome sequences of participants. It raised ethical issues, which were carefully considered from the outset, with the help of Professor Michael Parker and Dr Anna Middleton.

One of those questions was around ‘incidental’ findings. These are findings not related to the developmental disorder, and not looked for, but they could be important to the participant. For example, a mutation that increases the risk of developing cancer could be identified in someone’s DNA sequence. After careful consideration and consultation, the DDD team decided not to return these findings, should there be any, but to explore what kind of information people would want from such genome sequencing.

The DDD ethics team gathered the views from ~7,000 people from 75 different countries. They found that most people are interested in receiving genomic data, though not at all costs, particularly if it potentially compromises the ability to conduct research.

Bringing together the views of the public, patients, participants, clinical geneticists and researchers has shaped the debate in the UK and internationally. It has also set a strong precedent for similar projects, as genome sequencing becomes more widely available.

The future

Diagnosis by the DDD Project has transformed the lives of Alix, Pip and their family by helping them to connect with other families with the same condition

Diagnosis by the DDD Project has transformed the lives of Alix, Pip and their family by helping them to connect with other families with the same condition. Photo kindly provided by Clare Millington

A diagnosis doesn’t always change anything practical, like it did for Pip and Alix, but it is still important. Many families talk of a relief of finding out the cause of a condition. The diagnostic odyssey and years of testing are over.

DDD has over 200 associated research studies. These groups and collaborators are investigating the data to understand more about the conditions and the genes that cause them. 125 research papers have been published so far. Mutations in hundreds genes have been identified and 49 new conditions described.

Helen reflected on the eight years of the study.

“We’ve had the great good fortune to have excellent support from clinicians, scientists and families within the NHS. To bring those people together with the world-class team of scientists at the Sanger Institute has driven the study. It’s exceeded expectations. We’re continuing to discover new things using this data.”

“It really is a gateway. Once you can find what the genetic basis of these conditions is, it unlocks a lot of opportunities going forward.”

The team continue to open up these opportunities so that they can help more children like Pip and Alix.

As we move into the next medical era, genomics will be a foundation stone for many. Enabling research into some of the rarest conditions in the world now, will bring new options in the clinic for the patients of the future.

With thanks to Clare Millington for sharing her story

About the author:

Alison Cranage is a science writer for the Wellcome Sanger Institute.


For support on living with a rare condition, contact Rare Disease UK

Recruitment for DDD has now ended. See their website to find out more about the study

Read more about the project’s eight years on the Sanger website: Milestone reached in major developmental disorders project

25 Genomes: The Common Starfish. Image credit: Ray Crundwell
25 GenomesSanger LifeSanger Science

25 Genomes: The Common Starfish

By: Alison Cranage
Date: 04.10.18


The other-worldly, bright orange, 5 limbed creature is instantly recognisable. Paddling on a Cornish beach, or rockpooling on the Isle of Mull at low tide – it’s pretty likely you’ll come across one.

Lurking in the shallow waters of the UK and across the North Atlantic, the common starfish (Asterias rubens) is one of 1,500 starfish species in the world.

Asterias rubens was nominated by the scientific community and won a public vote to sequence the genome as part of our 25 genomes project. The common starfish falls into our ‘cryptic’ category of creatures. Cryptic, because their behaviour and many hidden talents are not well understood.

Hidden talents

Starfish sperm
The DNA we collected for Asterias rubens was from its sperm. Professor Elphick’s lab in central London is home to some 200 starfish where he collected the sample for us to sequence.

Possibly the most remarkable feature of starfish is their ability to re-generate limbs. If a starfish is attacked or is in danger, it can lose an arm in order to escape. It then grows a new one in its place. Nobody’s exactly sure how this works, but the key to finding out will be in its genome. Understanding the process would have huge implications for regenerative medicine.

The starfish genome could also help research into glue, including surgical adhesives that are used to heal wounds. Asterias rubens feasts on mussels and other molluscs. To get to the meat inside a mussel, it attaches its tube feet to the shell, by secreting a glue, and pulls it apart. Researchers are interested in that glue, and the genome sequence might reveal more about its production and structure.


Professor Maurice Elphick is working with us on the starfish genome. His research interests lie in neuropeptides. These tiny molecules act in the brain to control a whole range of processes including pain, reward, food intake, metabolism, reproduction, social behaviours, learning and memory.

Starfish don’t have a brain, but they are more closely related to humans than they are to most invertebrates. They do have neuropeptides – and his team have discovered many already. Several are involved in the unusual feeding behaviour of starfish.

To eat a mussel, once it’s forced open the shell, a starfish pushes its stomach out of its mouth. It partially digests its prey, takes up the resulting mussel ‘chowder’ and then retracts its stomach.

“I’m interested in understanding the evolution of neuropeptide systems, and also want to compare their functions and to find out what homologous molecules are doing in very different biological contexts.”
Maurice Elphick, Professor of Animal Physiology & Neuroscience, Queen Mary University of London.
One of the molecules they discovered triggers the stomach retraction. The equivalent molecule in humans clearly has a very different role. Professor Elphick explained: “Interestingly, we have also found that the neuropeptide behind the stomach retraction is evolutionarily related to a neuropeptide that regulates anxiety and arousal in humans.”

Professor Elphick explained how the genome sequence will enhance their ability to discover and study more neuropeptides. Because neuropeptides are tiny, the genes encoding them are not always easy to find. The team will study the genome in places where other species are known to have neuropeptide genes, to see if they can pinpoint an equivalent in the starfish (an approach known as synteny). This is only possible because we are using ‘long-read’ technology in the 25 genomes project – so the genomes will be the best possible quality, with few gaps.

The future

The starfish genome is now sequenced and the raw data available for any researcher to use. Over the coming months, our partners at EMBL-EBI will be assembling and annotating it, marking the position of genes and other features.

The finished genome will enable researchers to answer their own questions. About evolution, glue, neuropeptides or growing new arms.

About the author:

Alison Cranage is a science writer for the Wellcome Sanger Institute.


10 surprises from sequencing 25 new species
25 GenomesSanger LifeSanger Science

10 surprises from sequencing 25 new species

By: Alison Cranage
Date: 04.10.18

Sequencing human genomes is now routine at the Sanger Institute. Bacteria, yeast, worms, malaria, and other pathogens are also all regularly sequenced in their thousands. Our people are pretty well known for sequencing the human genome, but we’ve also contributed to the first sequencing of many others including the mouse, rat, zebrafish, pig and gorilla too.

The 25 genomes project is an entirely different beast. It’s posing some new, and frankly very odd, challenges. The diversity of the new species means we’ve had a steep learning curve. Here’s a peek at some of the weird and wonderful things we’ve discovered so far:

New Zealand flatworms will explode if you freeze them - not terribly helpful when trying to extract DNA from samples... Image Credit: S. Rae, Wikimedia Commons

New Zealand flatworms will explode if you freeze them – not terribly helpful when trying to extract DNA from samples… Image Credit: S. Rae, Wikimedia Commons

1. Don’t freeze flatworms

They explode.

You may well ask why we’d freeze them in the first place. But freezing samples, or in this case, whole worms, is standard practice to store them ready for DNA extraction.

Freezing New Zealand flatworms didn’t go so well though. The resulting sticky goop proved difficult to handle… and to get DNA from.

Is this the Oxford Ragwort you are looking for? The best way to know is take a picture and send it to an Oxford expert... Image credit: Rosser1954, Wikimedia Commons

Is this the Oxford Ragwort you are looking for? The best way to know is take a picture and send it to an Oxford expert… Image credit: Rosser1954, Wikimedia Commons

2. It’s good to get a second opinion when you’re identifying something

The Oxford ragwort was chosen to sequence in our flourishing category. We have ragwort growing here on campus, so we took a plant for sequencing.

But once we started, we soon realised it was not the ragwort we were looking for. The plant we had was hexaploid (it has 6 copies of its genome in every cell). The Oxford ragwort, which we were hoping to sequence, is diploid (it has 2 copies).

We sent a photo of the plant to an expert at Oxford University, who informed us we had the common ragwort.

There 300+ species of blackberry - and telling them apart can literally take years of observation. Image credit: Fir0002, Wikimedia Commons

There 300+ species of blackberry – and telling them apart can literally take years of observation. Image credit: Fir0002, Wikimedia Commons

3. There are over 300 species of blackberry in the UK

Yes, 300+.

They differ in a whole host of characteristics; sweetness, number of drooplets (the little blobs that make up the fruit), colour, size, thorns, flowers, lifecycle and more.

Finding the right one wasn’t easy, but we did sequence the correct one first time this time. Read more about the blackberry saga.

Fen Raft Spider - more popular than beavers, apparently. Image credit: Helen Smith, www.dolomedes.org.uk

Fen Raft Spider – more popular than beavers, apparently. Image credit: Helen Smith, www.dolomedes.org.uk

4. Fen raft spiders are more popular than beavers

In a public vote, the fen raft spider won out over the beaver to have its genome sequenced.

Both were contenders in the flourishing category of the project. Over 5,000 votes were cast in total, as part of “I’m A Scientist Get Me Out Of Here”.

Scottish Featherworts are a lonely bunch, they're all male and their female partners are almost half a world away. Image credit: David Freeman, RSPB

Scottish featherworts are a lonely bunch, they’re all male and their female partners are almost half a world away. Image credit: David Freeman, RSPB

5. All the featherworts in Scotland are male

Their potential partners are over 4,500 miles away in the Himalayas.

Botanists don’t know when the populations split, or how they got there. They only reproduce clonally in Scotland, and so it is uncertain how long they can last in this way.

Bush crickets have issues #1 - their genomes are 2.5 times bigger than we expected. Image credit: Richard Bartz

Bush crickets have issues #1 – their genomes are 2.5 times bigger than we expected. Image credit: Richard Bartz

6. Genomes are not always what you expect

We estimated that the genome of the bush cricket would be 2Gb, about 2/3rds the size of the human genome. We were wrong.

The estimate was based on the average cricket genome from the animal size genome database. But in fact it is 2.5 times larger than the human genome, coming in at 8.5Gb.

Read more about how this affected the sequencing.

7. It’s good to share

We knew this already, but this project has been a huge collaborative effort. It wouldn’t have been possible without scientists giving their time and sharing their expertise.

The Natural History Museum are a key partner for the 25 genomes project. They are helping with species identification and collection, as well as providing a link to natural historians and species experts across the UK.

The sequencing itself wouldn’t have been possible without PacBio. They have provided a machine for the project and provided expert technical support to enable the sequencing of the new species.

Our other collaborators include EMBL-EBI, The National Trust, The Wildlife Trust, Royal Society for the Protection of Birds (RSPB), Nottingham Trent University, Edinburgh University, 10x Genomics, Illumina and many more. See the full list here.

Bush crickets have issues #2 - they have cannibal tendencies. Image credit: Richard Bartz

Bush crickets have issues #2 – they have cannibal tendencies. Image credit: Richard Bartz

8. Don’t put bush crickets in a box together

They eat each other (or parts of each other).

Scallops are 20 times more genetically diverse than humans. Image credit: Asbjorn Hansen

Scallops are 20 times more genetically diverse than humans. Image credit: Asbjorn Hansen

9. Scallops are more diverse than people

We’ve found that scallops have 20 times the diversity of humans.

The king scallop was sequenced in the dangerous category of creatures. Human genomes are just 0.1 per cent different to each other – that is, only 0.1 per cent of your DNA code is different to any other person on the planet.

We have a pretty good idea why human genomes are so similar. It’s likely that events in our evolutionary past, like ice ages or infectious diseases caused a genomic bottleneck, which meant only a small group survived.

In scallops, 1.7 per cent of the DNA differs between any given individuals.

Using Pacbio machines, we read 25 new genome sequences in less than 10 months. Image credit: Wellcome Sanger Institute, Genome Research Limited

Using Pacbio machines, we read 25 new genome sequences in less than 10 months. Image credit: Wellcome Sanger Institute, Genome Research Limited

10. We can go faster than we thought

This project started in January 2018. We’re barely into October.

We’ve sequenced 25 new genomes in less than 10 months.

The PacBio machines we are using have doubled the amount of data they produce, per run, in the last 12 months. Next year, they will quadruple capacity.

About the author:

Alison Cranage is a science writer for the Wellcome Sanger Institute.


25 GenomesHuman Cell AtlasInfluencing PolicySanger LifeSanger Science

25 years of pushing the scientific boundaries

By: Alison Cranage
Date: 01.10.18

Wellcome_Sanger_Logo_Portrait_Digital_RGBThe Sanger Institute was set up to uncover the code of life – the human genome. We opened our doors 25 years ago and became the largest single contributor to the human genome project. The principles that sat behind those endeavours are still fundamental – tackling the biggest challenges, openness and collaboration. Those principles have also helped to make Sanger one of the world’s leaders in genomics and biodata.

The Human Genome Project transformed science. The seemingly simple order of four letters of DNA changes how we understand life. Vast new areas of research have opened up, impacting biology, medicine, agriculture, the environment, businesses and governments.

Alongside our sequencing facilities, our activities and research have grown to utilise genomic knowledge. Now we are using genomics to give us an unprecedented understanding of human health, disease and life on earth.


Read our original press release from 2003 announcing the completion of the Human Genome by clicking on the image above

Sequencing at scale

From the completion of the first human genome in 2003, we moved to the 1,000 and 10,000 genomes projects. Being able to compare sequences between individuals enables the understanding of diversity, evolution and the genetic basis of disease.

One of our latest projects is to work with UK Biobank to sequence the genomes of 50,000 individuals. Participants have already provided a wealth of data about their health and their lives – from blood samples to details of their diet. Linking this information to sequence data means we can understand more than ever before about the connections between our genomes and our health.

Kamilah the gorilla. Image courtesy of San Diego Zoo. To read about our work with the gorilla genome, please click the image

Kamilah the gorilla. Image courtesy of San Diego Zoo. To read about our work with the gorilla genome, please click on the image above

Across a wide range of species

Sanger researchers also sequence the genomes of pathogens and other organisms, as well as people. We have published the genomes of thousands of species – from deadly bacteria to worms to the gorilla. This enables research into evolution, infections, drug resistance, outbreaks, symbiosis, biology and host parasite interactions.


The cumulative amount of DNA the Sanger Institute has read over time

At increasing speed and accuracy

Our sequencing teams, led by Dr Cordelia Langford, are constantly developing the technology to improve both accuracy and speed. In early 2018, we celebrated sequencing over five petabases of DNA (if you typed it all out, it would take 23 million years). The first petabyte took just over five years to produce. The fifth, just 169 days. The amount of genomic data now rivals that of the biggest data sources in the world – YouTube, Twitter and astrophysics.


We run the largest life sciences data centre in europe

Supported by Europe’s largest life sciences data centre

The Sanger Institute is not only developing sequencing technology but also leading research in computational science, IT and bioinformatics, developing new ways to store and analyse petabytes of genomic and bio-data.

From sequence to clinic

How genome sequencing, or the sequence of any given individual, can be used hasn’t always been clear. But in the case of rare genetic diseases, it can change lives.


To read more about the Deciphering Developmental Disorders project, please click on the image above

Giving families an answer

The Deciphering Developmental Disorders (DDD) study started 8 years ago, led by Dr Matt Hurles at the Sanger Institute. Over 13,600 children with rare developmental conditions, but without a diagnosis, joined the study. Sanger researchers, working together with clinical geneticists, have used genome sequencing to diagnose their conditions. 40 per cent of the children now have a diagnosis – giving the families some of the answers they were searching for. Knowing the genetic cause of a condition can help doctors manage it, help families connect with others as well as plan for the future.

Watch our video about tracking MRSA in real time

Watch our video about tracking MRSA in real time by clicking on the image above

Stopping outbreaks in hospitals

The ability of researchers to rapidly sequence and analyse bacterial genomes is also leading to advances for patients.

Dr Julian Parkhill and colleagues showed it was possible to track an MRSA outbreak in a neonatal ward in real-time. By sequencing MRSA isolates from patients and staff, they could track the outbreak, following its path from person to person. This enables clinicians to prevent further transmission and bring the outbreak under control.

Now, it is UK policy to sequence the genomes of pathogens in an outbreak.

Watch our video showing global tracking of infectious disease

Watch our video showing global tracking of infectious disease by clicking on the image above

Fighting epidemics at a global scale

But disease knows no borders. Pathogens can easily spread around the globe. Professor David Aanensen, group leader at the Sanger Institute, is also Director of the recently established Centre for Genomic Pathogen Surveillance. The centre co-ordinates global surveillance of pathogens (such as MRSA and the flu virus) using whole genome sequencing. The data is openly available. Countries around the world can monitor the rise and spread of pathogens as well as their growing resistance to antibiotics. This enables swift action – with the aim of stopping transmission and saving lives.

The forefront of human genomics

The rapid development of technology has led to the ability of researchers to sequence the DNA, or RNA, from a single cell. Previously, much larger quantities of material were needed. Single cell RNA sequencing is a powerful tool. It allows the study of an individual cell’s activity, functions and composition. And high throughput machines means hundreds of thousands of cells can be analysed at once.

human-cell-atlas-infographic-6_Aug UPDATED

To view the full infographic for the Human Cell Atlas project, please click on the image above

Capturing every type of cell in the human body, one at a time

The Human Cell Atlas is capitalising on these advances. The international collaboration is co-led by Dr Sarah Teichmann at the Sanger Institute. Launched in 2016, scientists are using Next-Generation Sequencing to sequence 30-100 million single cells from the human body – out of a total of roughly 37 trillion. The aim is to create a comprehensive, 3D reference map of all human cells. This will lead to a deeper understanding of cells as the building blocks of life. It will form a new basis for understanding human health and diagnosing, monitoring, and treating disease.

Like the human genome project before it, this huge project will disrupt science and human biology. And like the human genome project it will drive technology to make it possible.

The diversity of life

Beyond human health, genome sequence data allows the study of evolution, biology and biodiversity.


To read more about our 25 Genomes Project, please click on the image above

25 Genomes for 25 years

For our 25th anniversary we have sequenced a more diverse range of species than ever before. 25 different species that represent biodiversity in the UK – from the golden eagle to the humble blackberry. Sequencing new species will push development of our technologies as each presents unique challenges. The sequences themselves will aid research into population genetics, evolution, biodiversity management, conservation and climate change.

But 25 species is just the beginning. Every single living thing has a genome, made up of exactly the same molecules of DNA or RNA. We want to uncover how the order of those molecules lead to the diversity of life on earth.


To see the full sized tree of life diagram, please click on the image above

It took 13 years to sequence the first human genome. When the project began, no-one knew where it would lead. Now we sequence the equivalent of one gold-standard (30x) human genome in 24 minutes – faster and deeper genomic insights are enabling discoveries that improve health and our understanding of biology. These insights are happening right now, and they will lead to unimagined benefits for future generations – all possible from a sequence of four letters of DNA code.

About the author:

Alison Cranage is a science writer for the Wellcome Sanger Institute.


25 GenomesSanger LifeSanger Science

25 Genomes at New Scientist Live

By: Alison Cranage
Date: 25.09.18

25logopngAlongside robots, slime and VR machines, Sanger researchers were at New Scientist Live last week – talking genomes. Sarah Teichmann was sharing the latest on the Human Cell Atlas Project and Peter Campbell finished a wonderful weekend of sharing the greatest stories from science by talking a fascinated audience through the latest on cancer science. On the main stage it was our 25 Genomes Project being shared with an intrigued audience – many keen to understand more about the genomes of 25 UK species, from catfish to blackberries

Julia Wilson and Cordelia Langford from the Sanger Institute took to the stage alongside Tim Littlewood from the Natural History Museum and Fergal Martin from the EMBL-European Bioinformatics Institute. They were discussing the project to sequence the genomes of 25 British species for the first time.

How it all began

Mike Dilger, TV broadcaster and naturalist, was asking the questions – first wondering how the project started.

“Only by understanding these species much better can we ever hope to protect our planet for ourselves and all the other species with which we share it.”

Mike Dilger, BBC One Show broadcaster and naturalist


The 25 Genomes Project being discussed at New Scientist Live. From left to right: Mike Dilger, Julia Wilson, Tim Littlewood, Cordelia Langford and Fergal Martin

Julia, Associate Director at the Sanger Institute, explained: “It came about because it’s our 25th anniversary. We celebrated with some parties, but we also wanted to leave a scientific legacy. And at the same time we wanted to celebrate the staff that we have at Sanger who are experts in DNA sequencing.”

It was a tough task to narrow down the ~66,000 species in the UK to just 25.

So the Sanger Institute connected with the Natural History Museum to help. Home to over 80 million specimens from around the world, Tim is providing the link between the Sanger Institute and natural historians who have detailed knowledge of the 66,000 UK species.

“Every species has a story to tell – it needs its champion.”

Tim Littlewood,  Head of Life Sciences from the Natural History Museum


The 25 Genomes that the Wellcome Sanger Institute is sequencing to celebrate its 25th Anniversary. To see the full-sized infographic, please click on the image

Categories of species helped the team to focus; flourishing, cryptic, iconic, flourishing, and floundering. And every species had to have a valid scientific reason for sequencing its genome.

Julia continued: “We also realised that the great British public are fascinated by the rich heritage and diversity of life in the UK and so we wanted a project that would resonate not just with our scientists and scientists beyond but a project that would pique the interest of the general public as well.”

So the Public Engagement team at the Wellcome Genome Campus got together with “I’m A Scientist Get Me Out Of Here” to organise a public vote for the final five species – one from each category.

Please click here for more about the 25 species selected

Rising to the challenge


The New Zealand flatworm – whose DNA has proved to be particularly difficult to extract

Mike asked the panel about the challenges of sequencing such a diverse range of creatures.

There was talk of ‘exploding flatworm goop’, tough plant skins and ‘difficult cellular structures’.

“We’re outside our comfort zone,” Julia admitted. But that’s a good thing and is helping us explore and learn how to overcome these new challenges.

Cordelia Langford, Head of Scientific Operations at the Sanger Institute described how the sequencing teams have had to change and optimise protocols to deal with the new organisms – but the learnings have had huge benefits.

“Sequencing of 25 genomes is setting the foundation for an enormously ambitious future. Our partnership with PacBio will help develop technology we need. We’ll learn a lot from the challenges of this project.”

The teams are applying this new knowledge to sequencing human genomes, refining their approach.

The first human genome took 13 years and billions of dollars. Now, the Sanger sequences the equivalent of a human genome in just 24 minutes, at a fraction of the cost.

Fergal described the excitement of sequencing a species for the first time. “It’s like a jigsaw. We have tiny fractions filled in. We don’t know what the big picture looks like. Once we fill it in we will have new questions, new science.”

Why sequence these genomes? What might you find?


Grey squirrels can resist the squirrel pox virus, but the red squirrel cannot. By comparing the grey squirrel’s genome with that of the red squirrel may show which gene(s) give immunity

Mike turned the panel’s attention to the ‘why’ of the project. Why sequence a genome at all? What do we expect to learn?

Tim was excited about the opportunities: “A massive amount of data is about to turn up. It’s going to reveal aspects of evolution we’ve not even dreamt of.”

Each species has secrets hidden in its genome. Robins can ‘see’ the magnetic fields of the earth – but we don’t know how. Starfish can re-grow limbs if they lose them. Grey squirrels are resistant to the squirrel pox virus whereas native red squirrels aren’t – and they’re dying out. Sequencing the genome will help researchers answer these puzzles. It will also drive research into conservation, climate change and evolution.

Fergal talked about how important it is that the data is publicly available for anyone to use.

“The sooner the data is public, the sooner science can be done on it.”

Fergal Martin, Ensembl Genebuild Project Leader, EMBL-European Bioinformatics Institute


Robins can see magnetic fields, it is hoped that reading their genome might reveal how

The EBI will be storing and publishing the data for the project. They will also be annotating the genomes – marking on the position of genes and other features.

“It shortcuts downstream research. Annotating takes a couple of weeks for us. An individual would take weeks or a year, it allows other researchers to ask more questions,” added Fergal.

Peering into the crystal ball…


Starfish can regrow their limbs. If we can find out which genes give them this ability, we might be able to improve wound healing

Mike asked the panel to consider the future. It’s 15 years since the human genome project was completed. Now 25 new species are being sequenced. What’s next?

Tim described life as variations on a theme, where every species is built from a blueprint of DNA. Sequencing different species will allow researchers to compare those blueprints, to understand the genomic diversity of the UK, and beyond.

Julia summed up: “We’re on the precipice of something even more interesting. Can we scale the software, can we scale the storage? Can we visualise the future? What questions should be asked?

“It’s a feasible and tantalising prospect to scale up even further. Why not think about sequencing 66,000 species?”

About the author:

Alison Cranage is a science writer for the Wellcome Sanger Institute.


The sequencing laboratory at the Wellcome Trust Sanger Institute. Credit: Genome Research Limited.
Sanger Science

SMRTer sequencing

02 March 2015
By John Lees

A PacBio sequencer, which performs SMRT (single-molecule real time) sequencing . Credit: Genome Research Limited

A PacBio sequencer, which performs SMRT (single-molecule real time) sequencing.
Credit: Genome Research Limited

For many years, short-read sequencing has been the bread and butter of bacterial studies at the Wellcome Trust Sanger Institute. Now, long-read sequencing is helping us to interpret bacterial genomes and understand how they evolve.

Short-read sequencing works by splitting the DNA of various microscopic organisms into small pieces that can then be sequenced. Researchers can then rebuild these small pieces into the whole genome of the bacteria.

This type of sequencing, sometimes referred to as shotgun sequencing, has seen huge drops in price over the past few years. Whereas we used to sequence the genome of one bacterium at a time, we can now sequence many thousands of disease-causing bugs, thus forming a picture of the entire population of each species.

We typically sequence billions of the small DNA pieces, giving reads of sequence around 100 base-pairs in length. This works very well for finding small, single-base, differences between samples. These variants are ubiquitous in human DNA, and researchers working in human genetics have carried out studies which have been able to show how these relate to disease susceptibility as well as traits such as height.

In bacteria, however, we often have much larger and more complex variations in sequence that can be unique to a single strain. In these cases, longer reads are necessary to reconstruct the entire sequence.

The recent development of SMRT (single-molecule real time) sequencing has helped us start to tackle this issue. While short-read sequencing makes lots of copies of small DNA segments then reads them all at once, SMRT reads single DNA molecules one base at a time producing much longer reads. The results are slower, but the genomes are easier to reconstruct.

Our latest publication looks at uses of SMRT sequencing for analysing function of restriction modification systems in the bacteria <em>Streptococcus pneumoniae</em>. Credit: Public Health Image Library, CDC

SMRT sequencing can be used to analyse function of restriction modification systems in Streptococcus pneumoniae.
Credit: Public Health Image Library, CDC

Not only do we get reads with mean lengths of many thousands of base pairs, we can also report methylation at each base. Methyl markers act as on and off switches on the DNA that can affect how other machinery in the cell interprets the coding sequence.

In a recent news and analysis article ‘R–M systems go on the offensive‘, Rebecca Gladstone and I look at some of the uses of SMRT sequencing for analysing function of restriction modification systems in the bacteria Streptococcus pneumoniae (a common cause of pneumonia).

Restriction modification systems are a bacterial equivalent of an immune system, cutting invading DNA into pieces while protecting the bacteria’s DNA through methylation. Under constant threat from rapidly evolving viruses, the bacterial population must be able to rapidly switch which sequence pattern they recognise, otherwise the viruses would be able to systematically avoid their defences.

A recent study used SMRT sequencing to find the different possible DNA arrangements of the restriction modification system, which allows the bacteria to survive this viral onslaught. In a concurrent study, bacteria with different DNA arrangements were grown in the lab, and SMRT technology was used to find the different methylation patterns these rearrangements cause, and their knock-on effect on virulence.

Taken together, these studies suggest that a mechanism that exists to defend against viruses also has an effect on whether or not the bacteria cause disease in humans – an important finding that may help us better understand why only some of these bacteria lead to illnesses such as pneumonia and meningitis.

Restriction modification systems are proving to be a very important part of bacterial evolution, and new technologies such as SMRT sequencing will continue to advance our ability to understand them.

John Lees a second year PhD student at the Wellcome Trust Sanger Institute. He works with Stephen Bentley and Julian Parkhill in the Pathogen Genomics group, and Jeff Barrett in the Medical Genomics group. John’s research involves combining human and pathogen sequencing data derived from cases of bacterial meningitis in the Netherlands. He’s currently interested in developing tools for association analysis, and applying them in this context.


  • Lees J and Gladstone R (2015). R–M systems go on the offensive. Nature Reviews MicrobiologyDOI:10.1038/nrmicro3435
  • Croucher NJ, et al (2014). Diversification of bacterial genome content through distinct mechanisms over different timescales. Nature CommunicationsDOI:10.1038/ncomms6471
  • Manso AS, et al (2014). A random six-phase switch regulates pneumococcal virulence via global epigenetic changes. Nature CommunicationsDOI:10.1038/ncomms6055

Related Links:

Credit: Luc Viatour / www.Lucnix.be
Sanger Science

Creating a gold-standard, not a rotten, tomato genome

Credit: Luc Viatour / www.Lucnix.be

Credit: Luc Viatour / www.Lucnix.be

Recently the full reference genome of the tomato (Solanum lycopersicum) was published in Nature (31 May 2012). Here, at the Wellcome Trust Sanger Institute, some of our sequencing people took part in the international collaboration of 10 countries that developed the DNA sequence. Each research group was tasked with working on a different chromosome, and we sequenced Chromosome 4. By being part of the project we were able to share our experiences and knowledge from producing animal reference genomes to enable the plant genome research teams to work together to deliver high-quality, standardised data.

When the tomato genome sequencing project began the teams estimated that the genome was 950 million base (Mb) pairs in size, split across 12 chromosomes. This was no small undertaking: it is one-third the size of the human genome (a project that had taken a worldwide collaboration 10 years to deliver). In addition, the project had limited funding resources, meaning that the work needed to be as tightly focused and efficient as possible.

Fortunately only 25 per cent of the tomato genome contains gene-rich areas, so the project teams agreed that capturing and sequencing these areas only would provide the most valuable information in the most effective way. To achieve this, we used mapping techniques to identify the gene-rich areas and used clone-by-clone sequencing to fully sequence them using the shortest number of sequencing runs.

Clone-by-Clone sequencing

We took clones taken from existing libraries and digested them with restriction enzymes, producing a fingerprint signature for each. We processed these fingerprint signatures in a database known as FPC (Fingerprint Contigs). Sections of signature in common indicate an overlap between clones and these overlaps can often be verified if known markers can be placed in them. By knowing where each clone belonged on the chromosome, we were able to select only a minimal set of clones to cover the area of interest. We made the FPC database for all the chromosomes publically available for the research community.

Fig 1. Screenshot showing the Fingerprint Contigs database. Clones highlighted in red and grey show the minimal tiling path selected for the sequencing project.

Using this approach, we mapped, sequenced and finished the gene-rich clones of Chromosome 4, which was estimated to be roughly 19Mb long. The UK team was led by Principal Investigators Gerard Bishop from Imperial College London, Graham Seymour from Nottingham University, Glenn Bryan from Scottish Crop Research Institute, and Jane Rogers from the Sanger Institute.

Finishing the genome

However, mapping and sequencing are not the whole story when producing a high-quality reference genome: the sequences need to be pieced together and inconsistencies resolved. In other words, the sequences need to be finished. This can be a long and time-consuming process, especially if a project consists of differing standards and approaches. Fortunately, we have long experience in finishing DNA sequencing data from our work on the human, mouse and zebrafish genome projects. So, to enable the other international teams draw on our experience and to develop the common standards needed for efficient finishing, we organised two International Finishing Workshops.

In these, representatives of the different research groups from across the world met and discussed the various challenges of working with the sequencing data. It was a chance to pool experience and look at efficient ways to progress each data set for each of the chromosomes. Our discussions centered around techniques for improving the data for the clones as well as ensuring that the metrics all the teams used to assess the quality of each clone was comparable.

Through meeting together and talking through the issues, the teams ensured that the resulting genomic sequence from all the laboratories involved showed parity. This data was then annotated and made publically available for the wider Solonaceae research community.

Another area that we were able to make a useful contribution to was to guide the project teams through the challenges of adopting and incorporating new technology sequencing data; which the project went on to adopt.

Funding bodies: BBSRC, EU-SOL, DEFRA and the Wellcome Trust