All posts by sangerinstitute

From the Wellcome Sanger Institute, a charitably funded genomic research organisation

10 surprises from sequencing 25 new species
25 GenomesSanger LifeSanger Science

10 surprises from sequencing 25 new species

By: Alison Cranage
Date: 04.10.18

Sequencing human genomes is now routine at the Sanger Institute. Bacteria, yeast, worms, malaria, and other pathogens are also all regularly sequenced in their thousands. Our people are pretty well known for sequencing the human genome, but we’ve also contributed to the first sequencing of many others including the mouse, rat, zebrafish, pig and gorilla too.

The 25 genomes project is an entirely different beast. It’s posing some new, and frankly very odd, challenges. The diversity of the new species means we’ve had a steep learning curve. Here’s a peek at some of the weird and wonderful things we’ve discovered so far:

New Zealand flatworms will explode if you freeze them - not terribly helpful when trying to extract DNA from samples... Image Credit: S. Rae, Wikimedia Commons

New Zealand flatworms will explode if you freeze them – not terribly helpful when trying to extract DNA from samples… Image Credit: S. Rae, Wikimedia Commons

1. Don’t freeze flatworms

They explode.

You may well ask why we’d freeze them in the first place. But freezing samples, or in this case, whole worms, is standard practice to store them ready for DNA extraction.

Freezing New Zealand flatworms didn’t go so well though. The resulting sticky goop proved difficult to handle… and to get DNA from.

Is this the Oxford Ragwort you are looking for? The best way to know is take a picture and send it to an Oxford expert... Image credit: Rosser1954, Wikimedia Commons

Is this the Oxford Ragwort you are looking for? The best way to know is take a picture and send it to an Oxford expert… Image credit: Rosser1954, Wikimedia Commons

2. It’s good to get a second opinion when you’re identifying something

The Oxford ragwort was chosen to sequence in our flourishing category. We have ragwort growing here on campus, so we took a plant for sequencing.

But once we started, we soon realised it was not the ragwort we were looking for. The plant we had was hexaploid (it has 6 copies of its genome in every cell). The Oxford ragwort, which we were hoping to sequence, is diploid (it has 2 copies).

We sent a photo of the plant to an expert at Oxford University, who informed us we had the common ragwort.

There 300+ species of blackberry - and telling them apart can literally take years of observation. Image credit: Fir0002, Wikimedia Commons

There 300+ species of blackberry – and telling them apart can literally take years of observation. Image credit: Fir0002, Wikimedia Commons

3. There are over 300 species of blackberry in the UK

Yes, 300+.

They differ in a whole host of characteristics; sweetness, number of drooplets (the little blobs that make up the fruit), colour, size, thorns, flowers, lifecycle and more.

Finding the right one wasn’t easy, but we did sequence the correct one first time this time. Read more about the blackberry saga.

Fen Raft Spider - more popular than beavers, apparently. Image credit: Helen Smith,

Fen Raft Spider – more popular than beavers, apparently. Image credit: Helen Smith,

4. Fen raft spiders are more popular than beavers

In a public vote, the fen raft spider won out over the beaver to have its genome sequenced.

Both were contenders in the flourishing category of the project. Over 5,000 votes were cast in total, as part of “I’m A Scientist Get Me Out Of Here”.

Scottish Featherworts are a lonely bunch, they're all male and their female partners are almost half a world away. Image credit: David Freeman, RSPB

Scottish featherworts are a lonely bunch, they’re all male and their female partners are almost half a world away. Image credit: David Freeman, RSPB

5. All the featherworts in Scotland are male

Their potential partners are over 4,500 miles away in the Himalayas.

Botanists don’t know when the populations split, or how they got there. They only reproduce clonally in Scotland, and so it is uncertain how long they can last in this way.

Bush crickets have issues #1 - their genomes are 2.5 times bigger than we expected. Image credit: Richard Bartz

Bush crickets have issues #1 – their genomes are 2.5 times bigger than we expected. Image credit: Richard Bartz

6. Genomes are not always what you expect

We estimated that the genome of the bush cricket would be 2Gb, about 2/3rds the size of the human genome. We were wrong.

The estimate was based on the average cricket genome from the animal size genome database. But in fact it is 2.5 times larger than the human genome, coming in at 8.5Gb.

Read more about how this affected the sequencing.

7. It’s good to share

We knew this already, but this project has been a huge collaborative effort. It wouldn’t have been possible without scientists giving their time and sharing their expertise.

The Natural History Museum are a key partner for the 25 genomes project. They are helping with species identification and collection, as well as providing a link to natural historians and species experts across the UK.

The sequencing itself wouldn’t have been possible without PacBio. They have provided a machine for the project and provided expert technical support to enable the sequencing of the new species.

Our other collaborators include EMBL-EBI, The National Trust, The Wildlife Trust, Royal Society for the Protection of Birds (RSPB), Nottingham Trent University, Edinburgh University, 10x Genomics, Illumina and many more. See the full list here.

Bush crickets have issues #2 - they have cannibal tendencies. Image credit: Richard Bartz

Bush crickets have issues #2 – they have cannibal tendencies. Image credit: Richard Bartz

8. Don’t put bush crickets in a box together

They eat each other (or parts of each other).

Scallops are 20 times more genetically diverse than humans. Image credit: Asbjorn Hansen

Scallops are 20 times more genetically diverse than humans. Image credit: Asbjorn Hansen

9. Scallops are more diverse than people

We’ve found that scallops have 20 times the diversity of humans.

The king scallop was sequenced in the dangerous category of creatures. Human genomes are just 0.1 per cent different to each other – that is, only 0.1 per cent of your DNA code is different to any other person on the planet.

We have a pretty good idea why human genomes are so similar. It’s likely that events in our evolutionary past, like ice ages or infectious diseases caused a genomic bottleneck, which meant only a small group survived.

In scallops, 1.7 per cent of the DNA differs between any given individuals.

Using Pacbio machines, we read 25 new genome sequences in less than 10 months. Image credit: Wellcome Sanger Institute, Genome Research Limited

Using Pacbio machines, we read 25 new genome sequences in less than 10 months. Image credit: Wellcome Sanger Institute, Genome Research Limited

10. We can go faster than we thought

This project started in January 2018. We’re barely into October.

We’ve sequenced 25 new genomes in less than 10 months.

The PacBio machines we are using have doubled the amount of data they produce, per run, in the last 12 months. Next year, they will quadruple capacity.

About the author:

Alison Cranage is a science writer for the Wellcome Sanger Institute.


25 GenomesHuman Cell AtlasInfluencing PolicySanger LifeSanger Science

25 years of pushing the scientific boundaries

By: Alison Cranage
Date: 01.10.18

Wellcome_Sanger_Logo_Portrait_Digital_RGBThe Sanger Institute was set up to uncover the code of life – the human genome. We opened our doors 25 years ago and became the largest single contributor to the human genome project. The principles that sat behind those endeavours are still fundamental – tackling the biggest challenges, openness and collaboration. Those principles have also helped to make Sanger one of the world’s leaders in genomics and biodata.

The Human Genome Project transformed science. The seemingly simple order of four letters of DNA changes how we understand life. Vast new areas of research have opened up, impacting biology, medicine, agriculture, the environment, businesses and governments.

Alongside our sequencing facilities, our activities and research have grown to utilise genomic knowledge. Now we are using genomics to give us an unprecedented understanding of human health, disease and life on earth.


Read our original press release from 2003 announcing the completion of the Human Genome by clicking on the image above

Sequencing at scale

From the completion of the first human genome in 2003, we moved to the 1,000 and 10,000 genomes projects. Being able to compare sequences between individuals enables the understanding of diversity, evolution and the genetic basis of disease.

One of our latest projects is to work with UK Biobank to sequence the genomes of 50,000 individuals. Participants have already provided a wealth of data about their health and their lives – from blood samples to details of their diet. Linking this information to sequence data means we can understand more than ever before about the connections between our genomes and our health.

Kamilah the gorilla. Image courtesy of San Diego Zoo. To read about our work with the gorilla genome, please click the image

Kamilah the gorilla. Image courtesy of San Diego Zoo. To read about our work with the gorilla genome, please click on the image above

Across a wide range of species

Sanger researchers also sequence the genomes of pathogens and other organisms, as well as people. We have published the genomes of thousands of species – from deadly bacteria to worms to the gorilla. This enables research into evolution, infections, drug resistance, outbreaks, symbiosis, biology and host parasite interactions.


The cumulative amount of DNA the Sanger Institute has read over time

At increasing speed and accuracy

Our sequencing teams, led by Dr Cordelia Langford, are constantly developing the technology to improve both accuracy and speed. In early 2018, we celebrated sequencing over five petabases of DNA (if you typed it all out, it would take 23 million years). The first petabyte took just over five years to produce. The fifth, just 169 days. The amount of genomic data now rivals that of the biggest data sources in the world – YouTube, Twitter and astrophysics.


We run the largest life sciences data centre in europe

Supported by Europe’s largest life sciences data centre

The Sanger Institute is not only developing sequencing technology but also leading research in computational science, IT and bioinformatics, developing new ways to store and analyse petabytes of genomic and bio-data.

From sequence to clinic

How genome sequencing, or the sequence of any given individual, can be used hasn’t always been clear. But in the case of rare genetic diseases, it can change lives.


To read more about the Deciphering Developmental Disorders project, please click on the image above

Giving families an answer

The Deciphering Developmental Disorders (DDD) study started 8 years ago, led by Dr Matt Hurles at the Sanger Institute. Over 13,600 children with rare developmental conditions, but without a diagnosis, joined the study. Sanger researchers, working together with clinical geneticists, have used genome sequencing to diagnose their conditions. 40 per cent of the children now have a diagnosis – giving the families some of the answers they were searching for. Knowing the genetic cause of a condition can help doctors manage it, help families connect with others as well as plan for the future.

Watch our video about tracking MRSA in real time

Watch our video about tracking MRSA in real time by clicking on the image above

Stopping outbreaks in hospitals

The ability of researchers to rapidly sequence and analyse bacterial genomes is also leading to advances for patients.

Dr Julian Parkhill and colleagues showed it was possible to track an MRSA outbreak in a neonatal ward in real-time. By sequencing MRSA isolates from patients and staff, they could track the outbreak, following its path from person to person. This enables clinicians to prevent further transmission and bring the outbreak under control.

Now, it is UK policy to sequence the genomes of pathogens in an outbreak.

Watch our video showing global tracking of infectious disease

Watch our video showing global tracking of infectious disease by clicking on the image above

Fighting epidemics at a global scale

But disease knows no borders. Pathogens can easily spread around the globe. Professor David Aanensen, group leader at the Sanger Institute, is also Director of the recently established Centre for Genomic Pathogen Surveillance. The centre co-ordinates global surveillance of pathogens (such as MRSA and the flu virus) using whole genome sequencing. The data is openly available. Countries around the world can monitor the rise and spread of pathogens as well as their growing resistance to antibiotics. This enables swift action – with the aim of stopping transmission and saving lives.

The forefront of human genomics

The rapid development of technology has led to the ability of researchers to sequence the DNA, or RNA, from a single cell. Previously, much larger quantities of material were needed. Single cell RNA sequencing is a powerful tool. It allows the study of an individual cell’s activity, functions and composition. And high throughput machines means hundreds of thousands of cells can be analysed at once.

human-cell-atlas-infographic-6_Aug UPDATED

To view the full infographic for the Human Cell Atlas project, please click on the image above

Capturing every type of cell in the human body, one at a time

The Human Cell Atlas is capitalising on these advances. The international collaboration is co-led by Dr Sarah Teichmann at the Sanger Institute. Launched in 2016, scientists are using Next-Generation Sequencing to sequence 30-100 million single cells from the human body – out of a total of roughly 37 trillion. The aim is to create a comprehensive, 3D reference map of all human cells. This will lead to a deeper understanding of cells as the building blocks of life. It will form a new basis for understanding human health and diagnosing, monitoring, and treating disease.

Like the human genome project before it, this huge project will disrupt science and human biology. And like the human genome project it will drive technology to make it possible.

The diversity of life

Beyond human health, genome sequence data allows the study of evolution, biology and biodiversity.


To read more about our 25 Genomes Project, please click on the image above

25 Genomes for 25 years

For our 25th anniversary we have sequenced a more diverse range of species than ever before. 25 different species that represent biodiversity in the UK – from the golden eagle to the humble blackberry. Sequencing new species will push development of our technologies as each presents unique challenges. The sequences themselves will aid research into population genetics, evolution, biodiversity management, conservation and climate change.

But 25 species is just the beginning. Every single living thing has a genome, made up of exactly the same molecules of DNA or RNA. We want to uncover how the order of those molecules lead to the diversity of life on earth.


To see the full sized tree of life diagram, please click on the image above

It took 13 years to sequence the first human genome. When the project began, no-one knew where it would lead. Now we sequence the equivalent of one gold-standard (30x) human genome in 24 minutes – faster and deeper genomic insights are enabling discoveries that improve health and our understanding of biology. These insights are happening right now, and they will lead to unimagined benefits for future generations – all possible from a sequence of four letters of DNA code.

About the author:

Alison Cranage is a science writer for the Wellcome Sanger Institute.


25 GenomesSanger LifeSanger Science

25 Genomes at New Scientist Live

By: Alison Cranage
Date: 25.09.18

25logopngAlongside robots, slime and VR machines, Sanger researchers were at New Scientist Live last week – talking genomes. Sarah Teichmann was sharing the latest on the Human Cell Atlas Project and Peter Campbell finished a wonderful weekend of sharing the greatest stories from science by talking a fascinated audience through the latest on cancer science. On the main stage it was our 25 Genomes Project being shared with an intrigued audience – many keen to understand more about the genomes of 25 UK species, from catfish to blackberries

Julia Wilson and Cordelia Langford from the Sanger Institute took to the stage alongside Tim Littlewood from the Natural History Museum and Fergal Martin from the EMBL-European Bioinformatics Institute. They were discussing the project to sequence the genomes of 25 British species for the first time.

How it all began

Mike Dilger, TV broadcaster and naturalist, was asking the questions – first wondering how the project started.

“Only by understanding these species much better can we ever hope to protect our planet for ourselves and all the other species with which we share it.”

Mike Dilger, BBC One Show broadcaster and naturalist


The 25 Genomes Project being discussed at New Scientist Live. From left to right: Mike Dilger, Julia Wilson, Tim Littlewood, Cordelia Langford and Fergal Martin

Julia, Associate Director at the Sanger Institute, explained: “It came about because it’s our 25th anniversary. We celebrated with some parties, but we also wanted to leave a scientific legacy. And at the same time we wanted to celebrate the staff that we have at Sanger who are experts in DNA sequencing.”

It was a tough task to narrow down the ~66,000 species in the UK to just 25.

So the Sanger Institute connected with the Natural History Museum to help. Home to over 80 million specimens from around the world, Tim is providing the link between the Sanger Institute and natural historians who have detailed knowledge of the 66,000 UK species.

“Every species has a story to tell – it needs its champion.”

Tim Littlewood,  Head of Life Sciences from the Natural History Museum


The 25 Genomes that the Wellcome Sanger Institute is sequencing to celebrate its 25th Anniversary. To see the full-sized infographic, please click on the image

Categories of species helped the team to focus; flourishing, cryptic, iconic, flourishing, and floundering. And every species had to have a valid scientific reason for sequencing its genome.

Julia continued: “We also realised that the great British public are fascinated by the rich heritage and diversity of life in the UK and so we wanted a project that would resonate not just with our scientists and scientists beyond but a project that would pique the interest of the general public as well.”

So the Public Engagement team at the Wellcome Genome Campus got together with “I’m A Scientist Get Me Out Of Here” to organise a public vote for the final five species – one from each category.

Please click here for more about the 25 species selected

Rising to the challenge


The New Zealand flatworm – whose DNA has proved to be particularly difficult to extract

Mike asked the panel about the challenges of sequencing such a diverse range of creatures.

There was talk of ‘exploding flatworm goop’, tough plant skins and ‘difficult cellular structures’.

“We’re outside our comfort zone,” Julia admitted. But that’s a good thing and is helping us explore and learn how to overcome these new challenges.

Cordelia Langford, Head of Scientific Operations at the Sanger Institute described how the sequencing teams have had to change and optimise protocols to deal with the new organisms – but the learnings have had huge benefits.

“Sequencing of 25 genomes is setting the foundation for an enormously ambitious future. Our partnership with PacBio will help develop technology we need. We’ll learn a lot from the challenges of this project.”

The teams are applying this new knowledge to sequencing human genomes, refining their approach.

The first human genome took 13 years and billions of dollars. Now, the Sanger sequences the equivalent of a human genome in just 24 minutes, at a fraction of the cost.

Fergal described the excitement of sequencing a species for the first time. “It’s like a jigsaw. We have tiny fractions filled in. We don’t know what the big picture looks like. Once we fill it in we will have new questions, new science.”

Why sequence these genomes? What might you find?


Grey squirrels can resist the squirrel pox virus, but the red squirrel cannot. By comparing the grey squirrel’s genome with that of the red squirrel may show which gene(s) give immunity

Mike turned the panel’s attention to the ‘why’ of the project. Why sequence a genome at all? What do we expect to learn?

Tim was excited about the opportunities: “A massive amount of data is about to turn up. It’s going to reveal aspects of evolution we’ve not even dreamt of.”

Each species has secrets hidden in its genome. Robins can ‘see’ the magnetic fields of the earth – but we don’t know how. Starfish can re-grow limbs if they lose them. Grey squirrels are resistant to the squirrel pox virus whereas native red squirrels aren’t – and they’re dying out. Sequencing the genome will help researchers answer these puzzles. It will also drive research into conservation, climate change and evolution.

Fergal talked about how important it is that the data is publicly available for anyone to use.

“The sooner the data is public, the sooner science can be done on it.”

Fergal Martin, Ensembl Genebuild Project Leader, EMBL-European Bioinformatics Institute


Robins can see magnetic fields, it is hoped that reading their genome might reveal how

The EBI will be storing and publishing the data for the project. They will also be annotating the genomes – marking on the position of genes and other features.

“It shortcuts downstream research. Annotating takes a couple of weeks for us. An individual would take weeks or a year, it allows other researchers to ask more questions,” added Fergal.

Peering into the crystal ball…


Starfish can regrow their limbs. If we can find out which genes give them this ability, we might be able to improve wound healing

Mike asked the panel to consider the future. It’s 15 years since the human genome project was completed. Now 25 new species are being sequenced. What’s next?

Tim described life as variations on a theme, where every species is built from a blueprint of DNA. Sequencing different species will allow researchers to compare those blueprints, to understand the genomic diversity of the UK, and beyond.

Julia summed up: “We’re on the precipice of something even more interesting. Can we scale the software, can we scale the storage? Can we visualise the future? What questions should be asked?

“It’s a feasible and tantalising prospect to scale up even further. Why not think about sequencing 66,000 species?”

About the author:

Alison Cranage is a science writer for the Wellcome Sanger Institute.


Mosquito in close up. Image credit: CDC/Dr Paul Howell
Sanger Science

Building capacity for genomic surveillance of malaria mosquitoes in Africa

By: Alistair Miles
Date: 21/09/2018

In 2009, a group of African entomologists and public health professionals founded the Pan-African Mosquito Control Association (PAMCA). The aim was to bring together mosquito control professionals from across the continent, and provide a platform to build capacity and coordinate efforts to improve mosquito control and prevent diseases like malaria. A few years later, in 2013, we began work at the Sanger Institute on a new project to sequence the genomes of more than 1,000 malaria mosquitoes collected from across Africa. It’s taken time for that work to bear fruit, but the project has now generated a wealth of new data that could be put to practical use.

Children sleeping under an insecticide-treated bednet. Photo credit: Martin Donnelly

Children sleeping under an insecticide-treated bednet. Photo credit: Martin Donnelly

Thanks to new funding from the Bill and Melinda Gates Foundation, these efforts are now coming together, and PAMCA has recently invited researchers to propose new projects on mosquito genomics in Africa. Our mosquito team at the Sanger institute is excited to be supporting those projects, and will sequence the whole genomes of thousands of new mosquitoes collected from locations where we currently have little or no data.

Mosquito-borne diseases, particularly malaria, still have a devastating impact on public health in Africa, and massive efforts are made each year to control mosquitoes. For example, in 2017 the Global Fund paid for 197 million insecticide-treated bednets to be distributed in Africa. This approach has led to major reductions in disease, but brute force can only get you so far. Under this intense and uniform pressure, mosquito populations are rapidly evolving, and insecticide resistance has spread across the continent. As we struggle now to gain the upper hand, those working at the front line of mosquito population monitoring and control have a pivotal role to play.

In an ideal world, every province in every malaria-endemic country would have a well-trained, well-resourced, dedicated team of medical entomologists. Those teams would regularly collect data on local mosquito populations and run experiments to compare different tools and tactics for mosquito control. They would assess whether current mosquito control efforts are still effective, give advice on the best plan of attack for the next season, and raise the alert about any changes in local mosquitoes, such as the emergence or spread of a new form of insecticide resistance.

In some parts of Africa, this vision is not so far from reality. But there is a broad consensus that much more could be done to build capacity for mosquito population monitoring and surveillance. With recent advances in genomics, there is also now an opportunity to equip teams with new tools to collect richer and more relevant data, and to join up data and coordinate efforts across countries. This is why the Gates Foundation and the Sanger Institute are partnering with PAMCA and supporting this new funding call.

Training session in sampling mosquito larvae for community volunteers. Photo credit: Prosper Chaki

Training session in sampling mosquito larvae for community volunteers. Photo credit: Prosper Chaki

The Sanger Institute has committed to provide genome sequencing for all of the new PAMCA projects. The call will fund nine projects in total, each lasting 12 months, and our aim is to sequence whole genomes of 500 mosquitoes from each project. A particular focus of this call is to fund projects working in locations where little or no data on mosquito populations has so far been collected. Ironically, these are often areas with high rates of malaria, and so filling in these gaps in our continental map of mosquito populations is vital.

Last year we published results from the largest ever genomic study of mosquitoes, which sequenced Anopheles gambiae mosquitoes, the species primarily responsible for transmitting malaria, collected from eight African countries. We found evidence that insecticide resistance is emerging locally in a number of geographically distinct mosquito populations, but it is also spreading between mosquito populations in different countries, in some cases separated by thousands of kilometres. These findings show that how insecticides are used in one location can have an impact on many other locations, and that mosquitoes, of course, do not respect political borders. The management of insecticide use, therefore, has to be coordinated.

Unlike the Aedes mosquitoes that transmit dengue and zika, which can travel over large distances by laying their eggs in car tyres, it is more likely that insecticide resistance spreads between Anopheles mosquito populations by adult mosquitoes flying to find new food and breeding grounds. But although we know that an insecticide resistance gene can find its way into populations as distant as Guinea and Angola, for example, we still don’t know where resistance is emerging, or what routes it can take as it spreads outwards from any given origin. Filling in these gaps in our understanding of mosquito movement and gene flow is a major goal of the new PAMCA projects.

Mosquito larvae. Photo credit: Martin Donnelly

Mosquito larvae. Photo credit: Martin Donnelly

Insecticides are likely to remain an essential component of mosquito control for the foreseeable future. But because of the challenges of resistance, and the significant costs and logistical issues involved in distributing millions of nets and spraying hundreds of thousands of homes each year, efforts are being made to develop alternative methods of mosquito control. New methods based on gene drive, where a selfish gene is introduced into a mosquito population and then spreads to cause the population to crash or become unable to transmit disease, have been proven to work in the lab, and are now being developed for use in the field. There are considerable technical, regulatory and logistical hurdles still to be overcome, but the technology has the potential to transform mosquito control in Africa. Understanding how mosquito populations are connected across Africa is obviously a prerequisite to planning any kind of deployment of gene drive, so sequencing mosquito genomes from across the complete geographical range of the species is all the more important.

Since its inception, PAMCA has established chapters in 8 countries, formed strategic partnerships with regional bodies and academic institutions, performed an Africa-wide assessment of entomological capacity, and run training workshops on gene drive. PAMCA has also held annual conferences in Kenya, Tanzania, Nigeria and Burkina Faso, bringing together entomologists, researchers, health professionals and members of governmental and non-governmental organisations. From 24-26 September, the 5th annual conference will be held in Victoria Falls, Zimbabwe. I’m excited to be attending the conference for the first time this year, and to be participating in a symposium on mosquito genomics, alongside colleagues from Sanger, the Liverpool School of Tropical Medicine, and PAMCA. It should be a great opportunity to discuss the new funding call. Hopefully the new PAMCA projects will go some way towards increasing capacity both for basic medical entomology and for the analysis and interpretation of genomic data, as well as generating a wealth of new data from contemporary mosquito populations in understudied locations.

Applying for PAMCA funding

Researchers interested in applying for the PAMCA funding, please see the PAMCA request for proposals document for more information. The closing date for the first round of applications is 3rd October.

About the author:

Alistair Miles is Head of Epidemiological Informatics in the group of Dominic Kwiatkowski, at the University of Oxford, and the Wellcome Trust Sanger Institute.

More information:

25 GenomesSanger LifeSanger Science

The Beast from the East? Vespa velutina

Words and pictures by: Alex Cagan
Date: 17.09.18

Prelude: Death from above

The Asian hornet - one of the 25 genomes being read by the Wellcome Sanger Institute

Today, you are a honeybee and today you are going to die.

You enjoyed a summer full of industry, dance and frenetic activity collecting nectar for the hive along with your 10,000 sisters. But all of that is about to change.

They came from the East.

One arrived this morning, unnoticed, a vanguard. After spotting your hive it flew back to its nest to recruit others. Now, a storm is gathering above you. Shadows skate overhead, silent portents of impending calamity.

This darkness is cast by Asian hornets, Vespa velutina. They are specialised honeybee hunters. Each one patrols its own small territory above the hive. Their ferocious mandibles and instincts designed to cleave through the carapaces of you and your sisters. Together the hornets create a tightening net through which you cannot escape.

If you were an Asian honeybee the arrival of the first hornet would have triggered one of the most extraordinary defensive maneuvers in the natural world. You and your fellow workers would surround the hornet. Rapidly twitching your wing muscles to generate heat, you would have smothered it in a pulsating mass, cooking the hornet alive in a blaze of cooperative fury before it could call for reinforcements. Together, you might just have survived.

But you are not an Asian honeybee.

You’re a European honeybee and neither you nor your ancestors ever faced such a threat, until now. As such your genetic and behavioural make-up is bereft of the tools or tricks with which you might hope to defend yourselves. What little resistance you can put up is futile.

After their grizzly work is done, the hornets take no honey. That is not what brought them here. It is the sinews of you and your sisters that they feast upon.

This morning the chambers of your hive were alive with the buzz of activity. This evening things are different. Drips of honey echo in silence.

The Asian hornet - one of the 25 genomes being read by the Wellcome Sanger Institute

25 Genomes: The Asian Hornet

I’m Alex Cagan, a post-doctoral researcher in genetics at the Wellcome Sanger Institute. The Institute turns 25 this year and to celebrate we are sequencing the genomes of 25 species found in the UK that have never had their genomes sequenced before. Over the course of the year I’ll be going ‘behind-the-scenes’ to chronicle this ambitious project in various ways. In doing so I hope to throw some light on the scientists, institutes and species involved in doing this kind of large-scale science and the reasons why it’s being done in the first place.

The Asian hornet is one of these 25 species being sequenced by the Wellcome Sanger Institute ( It is one of five species that was decided upon by a public vote, in a head to head competition with other species. The case for sequencing the Asian hornet was championed by the Sumner lab, a eusocial insect research group based at University College London ( They tell us that the Asian hornet is ‘a dangerous invasive species that poses a huge threat to bee populations in the UK and elsewhere in Europe’. Of all the 25 species being sequenced it’s also the newest arrival in the UK. The first confirmed sighting of a nest was in Gloucestershire in September 2016, with a second found in Devon in 2017.

For many, the mere mention of the word ‘hornet’ is enough to set the pulse racing. Even if we’ve never encountered one, the image of a swarm of angry wasps on steroids looms large in the imagination. But what of the reality? Are they truly winged nightmares or is this reputation undeserved? We already have hornets in the UK, so what makes Asian hornets so special? In search of answers I headed to London to meet with Dr Gavin Broad, Principal Curator in charge of insects at the Natural History Museum.

Natural History Museum, London

Behind the scenes

Visiting the NHM is always a joy. Encounters with wonders there as a child inspired me, along with countless others I’m sure, to become a biologist.

I’ve always had a yearning to go behind the scenes and glimpse the vast collections that have never been on display. I have read stories of the museum’s legendary collections, particularly in the excellent book ‘Dry store room No1’, by Richard Fortey (for a review see: I imagine endless corridors stacked with shelves containing specimens from all branches of the tree of life, gathered by naturalists across the globe. The biological equivalent to Borge’s fictitious Library of Babel, with specimens that display the near infinite variety of forms produced in nature. A library of life.

Pigeons at the Natural History Museum

It turns out my imagination is not far off. After signing in as a guest I meet Dr Gavin Broad at his desk. It is covered with a combination of books and wasp specimens of different sizes, encased in tubes or glass-windowed wooden boxes. I’ve come to the right place, I think to myself.

We begin by heading straight to the museum’s collection of Asian hornets. Dr Broad leads me to a corridor lined with tall grey metal cabinets as far as the eye can see. Each one has a number and a brief description of the contents within. After a few seconds I’m already disorientated. We stop at one case that looks to me like all the others. This one is labelled 58 – Vespidae / Vespinae?. He spins a wheel attached to the cabinets and the stacks begin to move, making a central row of cabinets accessible. Opening one of the top doors reveals a stack of wooden shelves that immediately fills the air with the rich smell of timber.

Dr Broad and his collection of Asian hornets at the Natural History Museum, London

Dr Broad retrieves the top two shelves and lays them on a table. Framed in glass are rows upon rows of hornets. They are beautifully laid out. A single pin connects each hornet to a paper record written with impossibly small, impossibly perfect handwriting. It is evocative of a golden age of natural history. An age, it dawns on me as I stand surrounded by cabinets of carefully documented insects, that may never really have ended.

Asian hornet collection

We take the shelves to a nearby room to take a closer look. Peering at them through the glass the first thing I notice is that they’re smaller than I expected. I’d imagined something at least thumb sized. They’re still larger than common wasps, but not by much. They’re darker too, a deep velvety black covers most of their body, in brilliant contrast to the yellow and orange hues that highlight their face, abdomen and the tips of their legs.

The population of Asian hornet that has begun to colonise Europe, Vespa velutina nigrithorax, has a particularly dark appearance. Up close it becomes clear that they are also covered in a fine layer of fuzzy hair. I think it would be wrong of me say this makes them look cute, but it certainly adds… character. As we spoke Dr. Broad took a high-resolution photo using the museum’s special insect imaging setup. Take a look and decide for yourself:

Close up of Asian hornet's head

A brief history of hornets

As I peer at hornets Dr Broad provides me with the broad brush strokes of their evolutionary history. Hornets, it turns out, are the result of a series of key evolutionary innovations. Hymenoptera, the order of insects containing hornets (as well as wasps, bees and ants), emerged in the Triassic, over 200 million years ago. One of the keys to their success was the evolution of the ovipositor, a specialised organ for laying eggs. This appendage really opened doors for the hymenopterans, as they could now lay their eggs in new places. These places could be hard to access crevices, where eggs would be safer from predators. Or, perhaps most ingeniously, on and even inside food sources such as fruits, or, in the case of the devious parasitic wasps, other insects.

Fast forward millions of years and we arrive at the emergence of the subclade Aceuleata. In these hymenoptera the ovipositor underwent a startling transformation. Instead of delivering eggs it became dedicated to delivering venom. The dreaded stinger was born. An organ used to create life metamorphosed into one that takes it. This sent the Aceuleate down a very different evolutionary road. While their ancestors laid their eggs on the food and then effectively abandoned them, the Aceuleata started to hunt and provision food for their offspring instead.

The next key development in the path leading to hornets was the emergence of eusocial societies. Eusociality is considered the highest level of organization among animal societies, defined by cooperative breeding, with breeding and non-breeding members dividing labour and working together. There are many startling parallels with our own societies here, yet they remain distinctly other and fascinatingly so. Eusociality appears to have arisen independently multiple times in the Aceuleata, in the social wasps, bees and ants. It may be that eusociality was made possible by the previous innovation, the stinger, which provided these fledgling societies with a viable way to defend their nests from intruders.

And what nests they are.

Not only does the NHM collection include specimens of the hornets themselves, they also have rows of boxes containing nests of different species. Dr Broad opens one box, at least an arm span wide, that is filled with a near complete example of an Asian hornet nest. It’s truly an architectural wonder. Created by wood pulp chewed and mixed with hornet saliva, it is both incredibly light and unbelievably sophisticated. The nest would have originally hung from a tree, and indeed a few remnant branches are still embedded in it. In this nest large swathes of the rippled paper shell are missing, revealing the interior structure. Horizontal discs, inlaid with hexagons that housed the larvae, are suspended by vertical columns that connect the levels. All the more remarkably, each nest is created and then abandoned within a single year.

Asian hornet's nest

It’s amazing to think that insects are capable of building such structures. One wonders what lessons there might be here for our own architects, given what these hornets can achieve with only a little wood and a little spit. Indeed, the field of biodesign continues to draw inspiration from these invertebrate constructs. Though for all their beauty the fact that they are usually packed with thousands of hornets would make me hesitant to admire one in the wild, where they are more akin to a devil’s piñata.

Having seen the rather beautiful hornets and their astonishing nests I’m starting to wonder what the big deal is about the threat of Asian hornets colonising the UK. Afterall, we have native hornets in Europe already, what harm could another species possibly do?

Apparently quite a lot. It turns out that unlike our native hornet species, which tend to be generalist hunters of many different insects, the Asian hornet is more like a precision honeybee-seeking missile. Asian hornets will search for honeybee hives and once they find one they will engage in ‘hawking’ behavior. This involves hovering near the entrance to the beehive, killing bees as they fly to and from the nest or try to defend themselves. The hornet, covered in a tough carapace, is practically impervious to the attack, while it’s own powerful mandibles are quite capable of tearing the poor bees apart.

In Asia, the local honeybees have coevolved with these hornets for millenia. This has resulted in the honeybees developing more effective defence mechanisms. The most famous being the ‘bee ball’ formed by Japanese honeybees. While European honeybees have been observed forming these bee balls, they do not appear to be of a sufficient intensity to effectively kill attacking hornets.


So for now the Asian hornets in Europe have the upper hand, mounting raids on honeybees that do not yet have a way to defend themselves. European honeybees have been in a perilous state for many years, with colonies collapsing at alarming rates due to a variety of factors, such as heavy pesticide use, that remain poorly understood(?). They really deserve a break. The arrival of a new and voracious specialised predator is the last thing that they need. Scientists and farmers alike are concerned that if Asian hornets become well established throughout Europe this will have a devastating impact on agricultural productivity. Or, in the worst case scenario, it could be the last nail in the coffin for the European honeybee.

How did they get here?

While there have only been two confirmed nests in the UK, Asian hornets are already established in France and Spain. How did they get to Europe in the first place? The most likely scenario, Dr Broad tells me, is that a hibernating queen arrived as a stowaway in a crate. It turns out that, like us, social insects such as the Asian hornet are particularly well-suited to becoming international colonisers. Though the reasons for their success are perhaps quite different from our own. The queen hornet mates with males only once in her life, during an initial mating flight. For the rest of her life the queen will carry the sperm she received in a specialised organ. She will use this to fertilise all future eggs she lays (apparently this remarkable process of internal fertilisation on demand is so precise that on average less than two sperm are released per fertilisation). Therefore, it only takes a single queen to establish a viable population. A lone voyager who contains within her the seeds of an entire society, waiting to bloom.

Vespa velutina have a particularly wide geographic range in Asia, this ability to tolerate a variety of climates likely contributes to their success as invaders. However, where exactly within Asia these invertebrate pilgrims came from remains a mystery. Here is where DNA could provide the answers. The DNA sequence of an individual is rich source of information but to understand it we first need a road map, a ‘reference genome’ from the same species that tells us the general shape and structure of the genome so that we can make sense of the pieces from any given individual. We hope that sequencing and assembling such a reference genome for the Asian hornet can shed light on this mystery, as well as many of their other secrets. What else can we hope to learn from sequencing the Asian hornet genome?

A genomic perspective

As well as helping us identify where exactly in Asia the hornets are likely to have come from the genome will enable us to design markers to help monitor their spread across Europe. This kind of information could be crucial to monitor which populations are spreading the fastest and which management strategies are proving most effective.

The Asian hornet genome can help us to understand the biology that underpins the amazing adaptations of this species. While this knowledge is fascinating in its own right, it could one day be used to help control their spread. For example, the Sumner lab proposes that if we can identify the chemical odorants and compounds that this species is sensitive to that information could be used to try and manipulate their behavior to limit their production.

Their website has more ideas on how the genome might be used, including the potential for biocontrol by gene-editing – and a list of great references if you want to get into the details.

So far, at least in the UK, vigilance has paid off. Both Asian hornet nests were quickly identified by the public and removed before they had a chance to spread. If you’d like to get involved there is an ‘Asian Hornet Watch’ app you can download (what a world!) to report sightings of Asian hornets ( and a handy identification poster (


I leave the Natural History Museum filled with a sense of awe, both for the Asian hornet and the museum’s collections which make such intimate encounters with the natural world possible. The carefully preserved and recorded specimens are a priceless trove of information. Sequencing the genome will be the latest chapter in our efforts to catalogue and understand the hornet. A new piece in an old game.

Asian hornet from above

About the authors:

Dr Alex Cagan is a postdoctoral fellow in Inigo Mortincorena’s research team at the Wellcome Sanger Institute, studying mutation and selection in healthy tissues and how this relates to cancer and ageing.



25 GenomesSanger Science

The golden eagle genome has landed

By: Kat Arney and Rob Ogden
Date: 03.09.18


The golden eagle genome has been sequenced as part of the Sanger Institute’s 25 Genomes Project

The golden eagle is undoubtedly one of the UK’s most iconic birds. With an impressive 2 metre wingspan and striking yellow feathered legs, it’s a stirring sight if you’re lucky enough to spot one soaring over the Scottish Highlands and Islands.

While golden eagles may not be critically endangered – the IUCN’s Red List of Threatened Species lists them as being of ‘least concern’ – their habitat is shrinking. Many of the already small populations around the world are continuing to decline, scattered through Europe, Japan and other areas, which is why today’s announcement of a new golden eagle genome is so important.

The eagle genome has been completed as part of our 25 genomes project, sequencing the genomes of 25 significant UK species ranging from pipistrelle bats and Eurasian otters to spiders, starfish and summer truffles. But while the announcement of a newly-sequenced species is undoubtedly exciting to fans of genomics, having a complete golden eagle genome is also a vital tool to help conservationists protect and manage these fabulous birds.

Genetics meets conservation

Conservation geneticist Dr Rob Ogden at the University of Edinburgh has been using simple DNA profiling and sequencing to monitor the genetic makeup of animal populations for at least 20 years, studying species as diverse as endangered gazelles, manta rays and (of course) golden eagles.

But, as Rob explains, while these tests can provide useful information about a population – such as how genetically diverse it is, and how individuals are related – it can only tell us so much.

“This basic information can help us when we come to make decisions about how to manage populations, but it’s based on looking at differences in small ‘snapshots’ of DNA scattered throughout the genome,” he says. “What we don’t really understand is what any of these genetic differences relate to in biological terms. If you keep a couple of populations separate for multiple generations, parts of their DNA will gradually drift apart, but we don’t know if they have any biological relevance at all.”

To draw an analogy with language, a simple alteration in spelling – for example, switching recognise to recognize – makes no difference to the meaning of the word. But more significant changes might alter the meaning of a word altogether, like changing ‘recognise’ to ‘organise’.

The simple DNA tools that Rob and his team have been using up until now can spot that individual ‘letters’ have changed, but they can’t identify the context of the biological ‘words’ in order to tell whether the difference is meaningful. And to read the ‘words’ in DNA (genes), you need to read the full genome.

The DNA that was used to create the new golden eagle genome came from a chick that was found dead in a Scottish nest during a raptor health study and was read using PacBio SMRT technology. Unlike other DNA reading methods, PacBio’s technique generates very long, high-quality stretches of sequence from which it’s easier to build a whole genome. This allowed the researchers to build what’s known as a ‘reference genome’, against which DNA from other golden eagles around the world can be compared.

Adapting to a changing world


The full genome sequence for the golden eagle will help conservation efforts

The full genome sequence for the golden eagle will help conservation efforts. Having a high-quality full genome sequence for the golden eagle opens up a treasure trove of biological information that conservationists can use to manage species more effectively in the wild.

“Now we have the whole genome we can identify specific genes and work out what they do, so we can see whether a specific change is likely to affect what happens in a cell or in a whole animal,” he says.

“Golden eagles are spread around the world in lots of different habitats and climates – there are hot weather birds and cold ones, eagles in forests and others in the hills – so are their genes adapted to their local environment? Do the genetic differences we see relate to important differences in the physiology of the animals which are related to how they can best survive in that particular environment?”

This knowledge is vital for managing endangered populations effectively as the global climate changes. One conservation tool is land management – generating certain types of habitats that will encourage particular species. Another option is translocation, moving animals from one area to another or releasing captive animals back into the wild. But if those creatures are poorly adapted to the environment they’re being put into, then there’s a good chance they’ll fail to thrive.

As temperatures across Europe are expected to increase over the next century and habitats change, it’s unlikely that large species like eagles will be able to adapt fast enough to cope. Instead, the most likely solution is for populations to move north in search of cooler climes.

“We know that Mediterranean golden eagles are genetically quite different from the Scottish birds, so perhaps we might see a situation where eagles from warmer climates become better adapted to a changing habitat type in northern Europe than the existing population that’s there now,” Rob explains.

“But if it’s taken 10,000 years to evolve a particular trait, there’s no way that’s going to adapt to climate change in the next hundred years, so understanding how these locally adapted populations have come about is really important for predicting how we can manage species in the future.”

Taking flight


Golden eagle in flight. Image credit: Martin Mecnarowski, Wikimedia Commons.

The completion of the golden eagle genome as part of the 25 genomes project is only the first part of the story. The golden eagle has been selected as one of the species to go forward into the Genome 10K project, carrying out detailed analysis of DNA from around 10,000 vertebrate species. Researchers will be using a technique called optical mapping to get an even more detailed picture of how the eagle genome is organised and to make sure they haven’t missed any bits.

“The golden eagle has been promoted up to the Premier league in terms of the quality of genome that we are going to obtain for it in the future,” Rob says. “The genome we have now is more detailed than anything that has been done before with golden eagles by quite a long way, but the next step is to make it way better – the best of all wild bird species.”

Having been lucky enough to watch a pair take flight in the hills on the Scottish island of Skye, watching with rapt attention as they swooped and circled round each other in a charming courtship dance, it’s easy to argue that golden eagles themselves are better than a lot of other birds.

“They certainly are very cool!” laughs Rob. “They’re an iconic species in the UK – people recognise them and are proud of them, but that’s true in every culture where you find golden eagles. It’s something that helps with conservation education because people can really relate to these animals and support projects that focus on saving them. They’re fantastic birds to work on.”

About the authors:

Dr Kat Arney is a science writer, public speaker and broadcaster, and author of the popular genetics books Herding Hemingway’s Cats and How to Code a Human. 

Dr Rob Ogden is Head of Conservation Genetics at the University of Edinburgh and a scientific adviser to the South of Scotland Golden Eagle Project.




Influencing Policy

A new deal on data – articulating the contract between science and people

By: Anna Middleton, Vivienne Parry and Julian Borra
Date: 20/06/2018


Are you with us?

For most of us it is hard to unpick the various declarations, assurances and guarantees made regarding the sanctity of our data. Even the General Data Protection Regulation still feels quite far removed from the everyday lives of ordinary people and is seemingly absent of any consultation with them. People need to both see and hear proof that they’ve been listened to. And they will act against anyone who seems to wilfully dismiss or disregard them – with every right to do so. With Facebook recently under the spotlight, there is tangible alarm about the use of our personal information by others. A breach of confidence or inappropriate access to data becomes really sensitive when we consider our most precious and personal information. In a health sense, what is more personal than our DNA? It’s what makes us ‘us’.

shutterstock_548025055We broadly know that scientists, clinicians and academic institutions collect, store, research and share DNA and medical information as part of the global endeavours to understand human health and treat human suffering. As part of this endeavour DNA information bounces around the Internet on an unbelievably massive scale, in ways unknown to the person who donated the data.

We probably expect ‘science’ is gathering, storing, analysing and sharing our data with respect, transparency and integrity. Whilst we hope that there is choice in this and we hope that we have actively consented, have we ever really discussed this as a collective society? Is this even possible?

Is it widely known that particularly for genetic research it is only possible to interpret what a glitch in DNA means if there are hundreds of thousands of DNA glitches from other people to compare it to. So, Big Data and DNA go hand in hand and are necessary for genomic medicine to deliver on its promises.

But, if science is truly going to serve humankind in the best way possible we need to be clear on the terms of the interaction and transaction with people, on their terms. And to do that we need a simple and clear conversation; to be certain that we can fulfil their demands or at least understand their desires and concerns.

The need for a People Powered conversation

The why?

A. The world of data is leaky

B. ‘Society’ hasn’t yet been part of a clear conversation

A. The world of data is leaky

When thinking about the leakiness of data we have to be honest. Nothing is perfect. No data is 100% secure. No system is flawless. No regulation is absolute. No cache of information is 100% bullet proof – and if anyone promises that, they’re over promising.

This is a given that we have got to accept.

shutterstock_675349192The type of data we are talking about here is the purest most precious kind, fundamental to our identity and existence. DNA and linked medical data – the foundational stuff that makes us who we are. Whilst our data might be ‘de-identified’, i.e. our name and address has been uncoupled from it, ‘anonymity’ cannot be absolutely guaranteed, because health information can always be linked to other personal information that is also on the web, and in our increasingly data-connected world, it is entirely feasible that we could, in theory, be identified from our DNA alone.

B. Society hasn’t yet been part of a clear conversation

There are a lot of companies and regulatory bodies that broadcast commitments and assurances about data use. But as there has been no collective societal ‘sign up’ – so the pronouncements and commitments could be seen as one-sided. Aside from (relatively small scale) targeted engagement initiatives, there hasn’t yet been a global two-way conversation. No complete consultation. No reciprocity. No serious voice given to the most important people and the principal recipients of the good works undertaken with their data.

This is especially problematic when it comes to trying to get more people to share their precious DNA – their genome – to advance medical research and progress healthcare. Which is why the scientists need to ‘go first’ with starting this conversation.

The Crunch

To move forwards we need:

  • the medical, clinical and academic institutions and the policy makers to clearly articulate the assumptions behind ‘people’s best interests’ and make this available for debate.
  • society to accept the tiny risk inherent in sharing their data with individuals, organisations

We need the people on both sides to be in this together – mutually accepting and supporting the power of precious data sharing to make life better.

Going forwards

Drawing up the New Deal

Simplicity is key. Two clear parties. Two clear beneficiaries. And equally mutual rewards.


shutterstock_517073326This is a reciprocal people-powered deal that brings both sides together for better. And the people’s voice must be consulted, heard and written into it.

This will require a comprehensive consultation process involving ordinary people from all walks of society.

This should involve Qualitative and Quantitative explorations and interrogations of the topic and the terms of the deal. It should involve experts in large-scale, population engagement techniques.

How do we start the conversation?

We need a starting point for that conversation – an ‘in’; and starting with the genome isn’t it. We know from our own research that the vast majority of the broader public have not yet encountered the term. However, more than 90% of the public are online and feeding their data into the grid. Thus ‘data’ is the conversation starter that can take us to DNA.

The binary algorithms that once sat invisibly inside tech tools that serve humanity –- have now become visible – data has become a ‘thing’. Something we can point at, hold up, scrutinise and hold accountable. Data and its big brother, Big Data, are now discussed, interrogated and judged everywhere from the Senate Commission to Mumsnet.

So, Data; our relationship with it; and with those who harvest, explore and administer it ‘on our behalf’ gives us a rich area from which to begin.

The conversation needs to focus on how science and humanity collaborate and win, together.


Language and Tone are everything. Pub and school gate rules apply (i.e. it can be discussed anywhere and everyone can participate). This is a People Powered Deal. Not a Protocol. This is a simple deal that respects and honours every human’s right to control their own data destiny. And confidently go into an agreement where they believe that the terms will be upheld to the best of everyone’s ability. Which means it must be couched in clear simple terms.


shutterstock_521213980We need the New Deal to be visible to all at every level. This will require a robust channel strategy – so we would also need to test best channels for spreading the word. And answer some pretty simple questions: Is it an event based news worthy event? Is it a web based platform for commitment with visible partners? Is it a socially driven call for better – a clarion call where we give the New Deal to the people and get them to use it as a lever to agitate for better – a movement.

We feel it is time for science and policy to scrutinise their direction of travel – with less rhetoric about the benefits of research and delivery of science (i.e. going in one direction from them to us) and more about serving humankind, recognising that we are all in this together. We, collectively are a partnership and we need the people of society to feel they sit with the scientists so that the journey into human discovery is one made together.

About the authors:

Anna Middleton is Head of Society and Ethics Research, Wellcome Genome Campus, Cambridge.

Vivienne Parry is Head of Engagement, Genomics England, London.

Julian Borra is a the Founder of Thin Air Factory, London.

This article has been reproduced from the GenomEthics Blog:

25 Genomes Project update
25 Genomes

25 Genomes update. Yes, it’s been a while …

By: Dan Mead, the 25th Anniversary Sequencing Project Coordinator
Date: 12/06/2018

25 Genomes Project, Wellcome Sanger Institute

25 Genomes Project, Wellcome Sanger Institute

The project had been progressing at a steady-ish rate for a while, up until a few weeks ago and now we’ve run into some technical problems.

We’re using a number of different technologies to make the final genomes of our 25 species, they all serve slightly different purposes, with the aim that they all complement each other. Combined these technologies (and the clever people and computer programs that check the data) means that we can make very, very good quality genomes in a matter of months (possibly better than the human genome which took over 10 years with the old stuff).

So where are we now?

Pacbio complete for 13

Pacific Biosciences SEQUEL system. This is the main thing we use, you can get a pretty good genome with this technology alone, it uses long bits of DNA (about 50,000 letters). This works in a similar way to most other technologies as it labels the DNA with coloured dyes and takes photos of them as they are added to the bit in the well. The difference is the scale- this tech means you can ‘read’ 10s of thousands of letters of DNA per well (and there are 1 million of those), leading to a better genome. See the video below for a better explanation.

10X complete for 16

10X Genomics Chromium system. This is a clever new use of existing Illumina sequencing capabilities. This tech basically allows us to map smaller bits of DNA into a larger picture.

Hi-C complete for 2

This was invented by Erez Lieberman Aiden and gives an even bigger picture of how the bits of DNA fit together, allowing it to be put together in chromosome-sized chunks.

Bionano genomics SAPHYR.

Another way of fitting DNA together, this is especially useful to see large chunks of it that have moved around somewhat.

[basic] Genome assembly complete for 14

So not bad progress. We’re a little delayed, but ok for now.

The trouble with starfish

However, some species are proving to be rather problematic, most notably the starfish. We got [a lot] of sperm from one starfish* a few months ago thinking that as the sole purpose of sperm is to deliver DNA to an egg it would be a good place to start. Wrong.

For some, as yet unknown, reason the DNA in starfish sperm is oddly fragile- when we tried to extract it from the cells it broke up into bits only 200 letters long- WAAAAY shorter than the 150,000 aimed for.

You might wonder how we got starfish sperm. Apparently there’s a special chemical (called GSS- ‘gonad stimulating substance’) that you inject into the starfish that makes them- shall we say- ‘produce’ the sperm in surprisingly large quantities.

Flatworms aren’t too helpful, either

Working with flatworms hasn’t been straightforward either. Their sliminess is a problem, but not the only issue. The worms are essentially just a long gut surrounded by a bit of muscle and other anatomical odds and ends. This means they have a lot of nasty enzymes and other digestive juices inside that are specifically designed to break up long molecules (see below for a video)

When you combine sliminess and a large concentration of enzyme with the effects of freezing for storage, you end up with what was affectionately labelled ‘a zombie worm mush’ by our wormologists. Needless to say the DNA was not of a usable quality.

And as for truffles…

Truffles, too, seem not to like having their DNA extracted. After a few unsuccessful attempts we’re going to try a technique from 1992 that gave good results in the paper it came from and seems simple, so fingers crossed…

About the author:

Dan Mead is the 25th Anniversary Sequencing Project Coordinator, for the 25 Genomes Project for the Wellcome Sanger Institute, Cambridge.

More on the 25 Genomes Project:

25 Genomes Project web page 

Human Cell AtlasSanger Science

A trusty guide for exploring the complexity of cells

By: Martin Hemberg and Vladimir Kiselev
Date: 14.05.18

Page image 2

Scmap can map individual cells from a query sample to cell types or individual cells in a reference. Previously identified cell types are coloured, unknown types are grey.

Ever since scientists first used a microscope to inspect cells, it has been recognized that they can be grouped into distinct cell-types based on their morphology. The difference between cell-types, both in terms of form and function can be striking, even though all somatic cells in an organism share the same DNA. The reason why cells may exhibit such striking differences can be attributed to the fact that each cell-type expresses only ~10,000 of the ~20,000 genes that are present in our genomes.

Traditionally, cell-types are defined based on morphology – shape. However, recent technological advances have made it possible to measure the level of all approximately 20,000 genes expressed in individual cells. The technology is known as single-cell RNA-seq (scRNA-seq) and it builds upon the powerful methods that were initially developed as part of the Human Genome Project.

To carry out a scRNA-seq experiment, the biological sample provided (e.g. some blood, a piece of skin or a biopsy from an organ) is dissociated and the cells are isolated individually. A set number of cells are then randomly selected to have their mRNA extracted and profiled. Using computational analysis methods, cells with similar profiles are grouped together, making it possible to identify cell-types based on which genes are expressed.

In the fall of 2016, the Human Cell Atlas (HCA), a hugely ambitious international project to “generate a comprehensive map of all 37 trillion cells in the human body” was launched. The HCA uses scRNA-seq to profile cells from the human body and one of the goals is to define cell-types based on mRNA profiles. Most likely, the first release of the HCA will contain more than 100 million cells that have been profiled using scRNA-seq.

One of the key challenges will be to make sure that the HCA reference can be queried in a way that supports the questions that are likely to be asked most frequently, such as comparing cells from a new sample to the reference. This could be important for example in a clinical setting, where a doctor would be able to compare a patient sample (e.g. from an unhealthy liver) to the reference. Such a query would allow the doctor to determine if there is a major imbalance in the composition of cells, or even if there are cells that have acquired a disease state (e.g. cancer) that is not present in healthy individuals.

To support such queries, we have developed a novel computational method called scmap, which takes a query and a reference scRNA-seq dataset as the input. For each cell in the query, scmap can identify both the cell-type and the individual cell from the reference that provides the best match, as in the Figure above.

Comparing scRNA-seq profiles is challenging, mainly for two reasons: the data is high-dimensional (approximately 20,000 genes) and it is noisy.

Scmap is based on a recently developed feature selection algorithm for scRNA-seq data from the Hemberg lab. The algorithm is able to identify the subset of genes that are most informative for clustering in an unsupervised manner, and it uses state-of-the-art machine learning methods to achieve high specificity and sensitivity. Moreover, scmap is very fast, which means that it can be used for real-time searches of very large references.

Another key feature is that scmap’s internal representation of the reference is greatly compressed which means that it can be run on an ordinary workstation. Finally, scmap is modular which means that a new dataset can be added to the reference without having to re-compute previously added datasets.

Even though the HCA is years from completion, there are already large collections of scRNA-seq datasets available. In addition to the HCA, researchers are also building cell atlases for many of the model organisms that are widely used in biomedical research. The most impressive result to date are two large collections of reference data for the mouse. Researchers have already used scmap to compare  the two mouse datasets to compare the different methodologies for collecting the data, providing an excellent demonstration of how scmap can help analysing large datasets.

Since scmap carries out a simple yet fundamental operation –  comparison of cells from two datasets – we anticipate that it will become an integral part of many scRNA-seq analysis pipelines, and that other, more complex tasks will come to rely on it. In particular, we believe that the speed and compression afforded by scmap will ensure that the HCA becomes an accessible and easy to use reference for the community.

About the authors:

Dr Martin Hemberg is a Group Leader at the Wellcome Sanger Institute, interested in quantitative models of gene expression.

Dr Vladimir Kiselev is currently the Head of the Cellular Genetics Informatics group at the Wellcome Sanger Institute and used to be a postdoctoral researcheroc in Dr Martin Hemberg’s group.

Related publication:
Kiselev VY, Yiu A and Hemberg M. (2018). Scmap – projection of single-cell RNA-seq data across datasets. Nature Methods. DOI: 10.1038/nmeth.4644

Further Links:



Job satisfaction: helping flatworms to chill out
25 Genomes

On Job Satisfaction

By: Dan Mead, the 25th Anniversary Sequencing Project Coordinator
Date: 08/05/2018

People often seem to gripe about their job but I, however, am happy to put myself firmly in the ‘extremely satisfied’ category. Besides from working on one of the coolest projects at a world-renowned science-y place I think the main reason might be the sheer diversity of what I do.

Here are some of the things that I’ve been up to over the past few weeks.

Communing with nature

I had a wander out to the Genome Campus wetlands to find the Himalayan Balsam). The plant’s Latin name Impatiens glandulifera comes from the way it spreads its seeds – when disturbed the seed pods explode, flinging the seeds out to a distance of up to 7 meters!

So why was I wandering about a nature reserve? Well, we had run out of the sample we collected last year for genome sequencing. The new sample will be used to test DNA recovery methods before the nice people at Reading University send samples of Himalayan Balsam that are resistant to the rust fungus used to control its spread.

Exporting Golden Eagle heart

I’ve drafted a CITES (it’s the treaty that governs endangered species samples) application for exporting some Golden Eagle heart for analysis in the US.

Being a Bat-man

Discussed the ins and outs [pardon the pun] of dissecting a bat.

Working the numbers

Made a list of all recorded species in the UK, then assigned them into families and worked out the average genome size and total amount of sequence that represents. In case you were wondering, it amounts to 85,000,000,000,000 letters [bases] of DNA- or the equivalent of over 20 million copies of war and peace.

Helping worms to chill out

Received some slimy worms in the post, put them in a fridge.

Bought (with my own money) some clay granules to try and make the worms a little more comfortable- they act as a contaminant free soil that keep them nice and moist.

Pre-fridge, this is what a reasonably happy flatworm looks like

Pre-fridge, this is what a reasonably happy flatworm looks like

Calling on the kindness of my wife

Booked on a conference in Vienna (nice), found out I need to fly the day I’m coming back from a christening from Manchester and return to LHR (not so nice). This is not the main problem however- it means my wife needs to drive for 3 hours with two adorable over-sized bacterial/viral culture vessels (I call them Alex and Ben). Much apologies were given.

Becoming an accomplished host

Arranged catering for a meeting (also re-arranged the chairs/tables)

Had numerous tele-conferences with participants in Germany, USA, China, Hungary etc.

About the author:

Dan Mead is the 25th Anniversary Sequencing Project Coordinator, for the 25 Genomes Project for the Wellcome Sanger Institute, Cambridge.

More on the 25 Genomes Project:

25 Genomes Project web page