All posts by sangerinstitute

From the Wellcome Sanger Institute, a charitably funded genomic research organisation

Roesel's Bush Cricket: The trouble with crickets and their ever increasing genomes... Image: Richard Bartz, Wikimedia Commons
25 Genomes

The trouble with Crickets

By: Dan Mead, the 25th Anniversary Sequencing Project Coordinator
Date: 23/04/2018

The type of cricket (Roesel’s Bush Cricket, Bicolorana roeselii or Metrioptera roeselii) we decided to sequence is interesting because it has spread out of its traditional salt-marsh environment to the interior of the country. We want to know if this is because it has adapted to live in less saline conditions or if it’s been possible due to the increased salt spreading on roads making corridors for the crickets to move along (or a combination of both).

This was one of the first species we received (from Björn Beckmann & Peter Sutton of the Orthoptera & Allied Insects group late in the summer of 2017. We got three, all from a field in Oxfordshire, and it turns out they’re not adverse to a little cannibalism – one of them ate the back legs of its roommate (the other was in a separate container although was also missing a leg) before I could separate them. Seeing as I was feeling a little mischievous I named them Hannibal, Oscar and Heather (despite them all being male – I took some creative license).

Getting the DNA from Oscar was one of the easier ones, good yield and reasonable (although not the best) quality, certainly good enough for PacBio sequencing though.

This is a femto pulse trace of the DNA fragment size, here it’s mostly in the 20Kb+ range, ideally it’d be bigger- an perfect trace has one giant peak at ~165Kb

This is a femto pulse trace of the DNA fragment size, here it’s mostly in the 20Kb+ range, ideally it’d be bigger- an perfect trace has one giant peak at ~165Kb

Later extractions also gave better DNA for the 10X sequencing and so things were going swimmingly. I’d estimated that the genome size for this was ~2Gb, based on the average cricket genome from the animal size genome database, so quite large for an insect, but reasonable enough for this project.

Little did I know the seemingly unending horror show that now befalls us …

Initially things progressed as expected, the PacBio sequencing went well – producing >95Gb data. Likewise for the 10X, we got 120Gb from that, so ~50X coverage for both.

Things started to get a bit icky when the assembly first failed for PacBio, then for 10X. A PacBio miniasm assembly then came back with a revised genome size of 2.8Gb, bigger than expected but not too bad at this point, although the N50 was terrible (76Kb).

The next thing that happened was a kmer-based quality control report – this gave the genome size as 4.6Gb! We’re definitely into the realm of the unexpected now … this reduces our effective coverage to ~20X, waaay less than is needed for a decent assembly.

Finally (after running out of memory a few times) Supernova ran on the 10X data. This returned a gut-wrenching estimated genome size of 7.5Gb!

Combine this with the heterozygosity estimate of around 3.04% and everything looks a little wonky.

So what went wrong?

I’ve just been back to the genome size database and there is an outlier in the sizes – the camel cricket (Ceuthophilus stygius) which is a cave cricket from North America.

By James St. John – Ceuthophilus stygius (camel cricket) inside entrance to Great Onyx Cave (Flint Ridge, Mammoth Cave National Park, Kentucky, USA) 1, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=39945246

By James St. John – Ceuthophilus stygius (camel cricket) inside entrance to Great Onyx Cave (Flint Ridge, Mammoth Cave National Park, Kentucky, USA) 1, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=39945246

This beauty has a genome size of 9.55Gb!

So the question is how likely is this to be the case (or close to) for our cricket?

Taking all the crickets with known genome sizes from the database (there aren’t that many – 7 – one of which is the gloriously named ‘unwelcome mole cricket’ Neoscapteriscus borellii) and putting them into the phyloT tree generator and IToL (Interactive Tree of Life) gives you this:

I don’t think you can by these anymore

Brown curry mole crickets in a can: I don’t think you can buy these anymore

Sorry, that’s just a can of curried crickets, the tree looks like this:

Unwelcome mole crickets are unwelcome in NCBI apparently, there’s no taxon number so no tree entry.

Unwelcome mole crickets are unwelcome in NCBI apparently, there’s no taxon number so no tree entry.

From this it looks like our Tettigoniidae bush cricket pre-dates our large-genomed friend the camel cricket (a Gryllacrididae) and split from the ‘true’ crickets (the Gryllidae) a while back. But how far?

Then we used another online resource, the timetree, to see when this split occurred. From the below you can see it was ~270MYA, which is a long time, plenty of time for some weird genome expansion to have happened I guess.

Gryllidae and Gryllacrididae separated 100MY before Tettigoniidae diverged from Gryllacrididae (~172MYA).

Gryllidae and Gryllacrididae separated 100MY before Tettigoniidae diverged from Gryllacrididae (~172MYA).

You may have noticed that this tree is a little different, this is for two reasons:

  • It’s a simple expansion of the last shared taxon group, the Ensifera.
  • The Gryllacrididae and Tettiginiidae split from the Rhaphidophoridae, not the other way around.

Before you ask, no I don’t know why, but I assume the latter is correct as the first tree lacks all the taxon groups for an input.

The sole example of the Rhaphidophoridae taxon has a 1.55Gb genome and as this line goes back to the common ancestor of the Roesel’s cricket it could be that our initial estimate is true OR, more likely, there’s been some horrible expansion that involves (multiple?) genome duplication events.

The thing that’s really annoying is my own lack of knowledge and tendency to make (in this case stupid) assumptions – who knew that Gryllacrididae and Gryllidae are actually further distant than Gryllacrididae and Tettiginiidae? Taxonomist probably, or someone who studied classics.

Anyway we’re doing some more sequencing to get extra 10x data, hopefully this will answer the question once and for all….stay tuned!

About the author:

Dan Mead is the 25th Anniversary Sequencing Project Coordinator, for the 25 Genomes Project for the Wellcome Sanger Institute, Cambridge.

More on the 25 Genomes Project:

25 Genomes Project web page 

Human Cell AtlasSanger Science

New computational method reveals where genes are expressed

By: Valentine Svensson
Date: 06.04.18

main figure

SpatialDE automatically identifies sub-structures (middle), and links these to genes that depend on spatial location (right) in mouse olfactory bulb data from Stahl et al 2016.

In the body, cells are often considered the atomic fundamental units. In a similar way to how atoms are structurally joined to form molecules, cells form tissues. The organization of these tissues let different cell types work together, to enable organs in the body to perform their functions. These structures have been studied and catalogued for hundreds of years in the field of histology, using microscopes.

During the 20th century molecular techniques have enabled researchers to investigate how different genes and proteins are used in different parts of tissues, to understand how cell types collaborate in tissues. Large scale projects such as the Protein Atlas or the Allen Brain Atlas have been systematically performing molecular measurements of individual genes and proteins in tissues.

In the last decade, tremendous advancements in the scale and cost effectiveness of molecular measurements have been made. This has led to the analysis of single cell gene expression -ie which genes are switched on in a cell. This lets researchers define cell types from molecular data. Similarly, spatially defined molecular measurements of gene expression can now be made on thousands of genes in single cell resolution. Projects that would previously have taken hundreds of people and long time schedules can now be done by individual labs, meaning more types of tissues in more conditions can be investigated.

The most powerful new high throughput methods generate measurements of expression levels for tens of thousands of genes. At this scale just looking at all the genes will not be possible. Typically these sorts of data have been analysed by only looking at a handful of known marker genes.

We have now developed a method that tells us if there is a relationship between genes expressed in cells, and where those cells are located.

Our SpatialDE method filters and sorts all the genes according to how certain we are that cell location matters for the expression levels. In the main data we analysed for our paper, out of close to 12,000 genes measured only 67 genes were filtered as “spatial”. By focusing on this shortlist of genes, researchers can quickly discover genes previously unknown to be related to tissue structure.

Tissues are often divided into sub-structures, based on visual appearance, or by expression of particular proteins indicating a specific function of that sub structure. The brain for example has different layers, so does skin: the thymus on the other hand consists of connected lobules with medullas inside.

The sub-structures are defined by different cell type compositions. For cells to have major functional differences they need to express many genes together that are specific to the function, which will be reflected on a whole tissue level. We created a second method which uses this property to automatically define tissue substructures. In one go, researchers obtain the genes defining the regions, as well as labels for the regions themselves.

This allows researchers to zoom into the structures of the tissue. The markers allow design of downstream functional experiments to investigate which genes cause the structure and which are a consequence of the structure. The spatial labels then allow researchers to investigate the interaction between structures, the development of the structures, and how the tissue performs its function.

Relating cell types to their spatial structure and organization in tissues is a major component in the ongoing Human Cell Atlas project. But the technologies for spatial gene expression measurements are feasible to perform for individual labs that wants to study their tissue of on a genomic level. With our methods, researchers can answer new questions about the relation between genes and tissue structure that was not possible before, which we demonstrate in our paper.

In the long term, genomic and quantitative spatial gene expression measurements, captured and analysed by methods such as SpatialDE, may form the basis of histology and pathology in the clinic. This would allow this area of medical diagnostics to become even more powerful and personalized.

About the author:
Dr Valentine Svensson was an EMBL PhD student supervised by Sarah Teichmann at the Wellcome Sanger Institute, collaborating with Oliver Stegle at the EMBL-EBI when this work was done.  He is now a postdoctoral scholar in the Division of Biology and Biological Engineering at Caltech, working with Lior Pachter on statistics for omics based cell biology.

Related publication:
Valentine Svensson, Sarah A Teichmann and Oliver Stegle. (2018). SpatialDE: identification of spatially variable genes. Nature MethodsDOI:10.1038/nmeth.4636

Further Links:

 

25 Genomes

A cautionary tale about blackberries [2/2]

By: Dan Mead, the 25th Anniversary Sequencing Project Coordinator
Date: 09/03/2018

At the end of the last post things were looking up, a source of plant material was found, it was now just a case of waiting for the seeds to be delivered so we could grow some up.

This was just before I was due to go to the Plant and Animal Genome conference (PAG) in San Diego, California, in January this year. For those of you thinking “wow, that must be great, getting to go to a nice sunny place for work- must be like a free holiday”, no. Academic conferences are not an excuse for a jolly, the schedule for these generally means you work LONGER hours than normal.

PAG this year ran from the Friday 12th January to Wednesday the 17th- including the weekend, with talks etc scheduled from 8am-6pm (not including extra workshops in the evenings). Marry that with a full day’s travelling on either end and you can hopefully see my point.

Anyway, I’ve never had a need to go to this conference before as I’ve not worked with plants or animals (except malaria, but that’s a disease, even though technically an animal) so I thought I’d go to a wide variety of talks* to see what the craic was.

*There’s everything from wheat to water buffalo, insect genome assembly to livestock breeding.

One of these that seemed vaguely relevant to the project was a talk on the genetics of cherries. These are a big deal in Japan, where the speaker was from, with red skinned fruit and white flesh being the most desirable traits.

£106 for 40 cherries anyone? wordpress.com

£106 for 40 cherries anyone? wordpress.com

Some interesting stuff in this talk- apparently cherries have ~44,000 genes in a 350Mbp genome (humans have ~20,000 in a 3,000Mbp genome) and most trees there are bred from only a few (2 or 3 I think) original sources.

This talk was coming to an end and I was about to leave when up pops a lady with an announcement about the “Rosaceae Rosexec meeting” that was in a couple of days’ time. Blackberries are a member of this fruit family so I decided to crash the meeting (not really, I asked politely and was invited to attend).

As is fairly routine at these sorts of things, there was a ‘stand up and introduce yourself and your research’ bit at the beginning which is all well and good. Me being new, and that I got a bit lost^ trying to find the room, I ended up near the back of the room and was one of the last to speak up.

^PAG is a pretty big conference, there are over 3000 people there and there are dozens of rooms where talks/meeting/workshops happen.

Up I get and proceed to tell people that we’re sequencing the blackberry genome to a largely pleased audience when I notice one person giving me the daggers.

I’d already noticed her and was planning on chatting later as she mentioned blackberries in her intro. The rest of the meeting was great, lots of good work being done on soft fruit.

So at the end of the meeting I go and introduce myself again and explain what we’re doing in a bit more detail. Turns out that Margaret (Blackberry geneticist, not the Iron Lady) has already started sequencing the species and has sunk a bunch of her laboratory start-up seed money into it so that was why she was a bit miffed.

This was, however, the start of what I hope will be a very fruitful [ahem] partnership! After I’d told her that we were planning on releasing the data publicly and would be happy to finish the rest of the sequencing (as part of a new collaboration) things started looking up. We’re now working together, with other fruity people, to get this done. The combined efforts mean the cost is spread around nicely – and now I have actual experts in fruit genomes to help!

The lessons learnt here:

  • Don’t assume there’s only one species, there may be many that look the same (to the untrained eye)
  • Don’t be afraid to call people out of the blue, most often they’re as helpful as can be
  • Conferences are great for meeting people to work with
  • A bit of luck never hurts!

Finally, I met a chap at the Rosexec meeting who must have googled the 25 Genomes project whilst there as he approached me afterward with a bit of sage advice. We’re planning on sequencing the New Zealand Flatworm (it’s an invasive species in the UK) and he said we really should consult the Maori on this, which is now happening – updates to follow (if it’s interesting that is).

About the author:

Dan Mead is the 25th Anniversary Sequencing Project Coordinator, for the 25 Genomes Project for the Wellcome Sanger Institute, Cambridge.

More on the 25 Genomes Project:

25 Genomes Project web page 

25 GenomesSanger Life

A cautionary tale about blackberries…[1/2]

By: Dan Mead, the 25th Anniversary Sequencing Project Coordinator
Date: 08/03/2018

Blackberries – they’re everywhere right, in gardens, hedgerows, at the side of the road; in fact pretty much anywhere you go you can find a blackberry bush (?shrub, ?tree, ?thicket – who knows what the noun is!). This should make finding one for the project as easy as pie, I thought…

As an aside, before we included blackberry in the project I checked on the Kew Gardens plant database to see if they a suitable genome and, hooray, they do. It’s 450Mb (about 1/13th the size of the human genome) and diploid as well – so no odd chromosome duplications* to worry about.

Genomes Assemble!

It’s important to have species that are haploid (single copy of each chromosome- lots of insects have haploid males) or diploid (two copies of each chromosome, like humans- we have 23 pairs). This is because putting together the bits of DNA from the sequencing is much more difficult if there are more than two copies of everything. It would be simple if the copies were exact but there are always small differences between them; sometimes single bits of DNA vary, sometimes small sections are missing or duplicated or mirrored (inverted) etc.

Imagine that your genome of interest only had one chromosome- pictured as say, a jigsaw puzzle, let’s then say as a picture of a cat- in this case the @genomecat, Quincy.

This is the easiest to assemble, to MASSIVELY oversimplify things, you just match the edges to get the picture.

For two copies (diploid) it’s a little more complicated, twice the number of pieces and some will be a little different. Here you filter out the bits that look different and put these in the second chromosome (or ‘alternative haplotype).

Things get really hard when you enter the murky world of polyploidy (lots of copies of chromosomes). In this example our cat has 4 chromosomes (tetraploid) – all slightly different. The problem comes in trying to put each different piece in a separate chromosome, this is fine if 4 pieces all look different (or the same) but if there are 2 identical pieces how do you know where to put them?

It’s a simplistic way of looking at things but [sort of] gets to the point – assembling genomes like this is tricky, so we like to avoid this if possible!

Back to Black(berries)

Ok, on with the blackberry story. Finding a plant was, unsurprisingly, very easy- there’s a big thicket on the grounds that’s about 3m high and must cover 100m sq. or so. So we got some leaves and stored to wait for extraction.

A few weeks later I went on a nice trip down to the Natural History Museum in London to chat to some botanists (Fred Rumsey and Mark Carine) about preserving some samples for their collection. This involves getting some of the plant and pressing it flat/drying out to act as an example of what the species looks like – so people can check we’ve got what we say we have (anyone can go and look the collection there, by the way, some of the plants were collected hundreds of years ago!).

I think this is a ragwort.

We got chatting and I mentioned that we were doing the blackberry as part of the project. At this point Fred drops a casual comment that nearly made me soil myself:

[paraphrased]

“That’s interesting, which species are you doing? There are well over 300 in the UK…”

“Go on,” I say, “pray tell me about these 300 species.” (the exact conversation escapes me, it’s like a half remembered feverish nightmare now)

“Oh yes, there’s a whole book on them- I think we’re up to about 360 by now- you know the only way to identify a species confidently is to observe it’s life-cycle for at least a full year, most likely two to be sure. Wait whilst I find the book…”

[large thud as this tome hits the desk]

“Here it is, you can see how to identify them from this.”

“Thanks Fred,” [I hope I said that and not what I was thinking] “very interesting.”

And thus began the blackberry saga.

The Blackberry Saga

First thing to do was to find out what the species was that we had, however if you remember the ‘observe for a year’ bit this would take too long. So the next option (seeing as it’s definitely some kind of blackberry) was to try to find out the ploidy, sequence it and get the species ID later. Now this isn’t ideal so I also thought it would be a good idea to try to find a source that already has a known diploid species.

Turns out neither of these things was quite so simple.

One of the best ways of finding out the number of copies of chromosomes is to actually count them by looking at them using a microscope (this is called karyotyping), noting how many look the same, and the total number. So I asked our specialist and it was bad news – it’s too difficult to do in the time-frame but he put me on to a Professor from the University of Leicester who knows about blackberry genomes and things.

In an unlikely case of serendipity, it turns out this particular Professor’s mother worked on blackberries for a book in the 1950s, so he sent me a bunch of stuff to read and we had a nice chat on the phone. Now I had information on which species I should be looking for, Rubus ulmifolius, a relatively common diploid native to the UK. This species is also found in the local area around the Genome Campus where we sampled our blackberry from so I had a small measure of hope we had stumbled upon the right one.

Red boxes are locations of R. ulmifolius. On a zoomed out view it’s apparent there wasn’t any surveying east of the red boxes.

Of course I still had to find out and, after a few phone calls and suggestions, I contacted Julie Graham at the James Hutton Institute who had a test that they could do. It took a few weeks and in the end this was a bust, the Campus plant was tetraploid.

I may have been a little disappointed

The only option now was to find the plant from somewhere else.

There are a number of institutions that do soft fruit research (NIAB/EMR, Hutton, ART , Reading University, Leicester University, Earlham Institute, etc) and I called them all. I also called a bunch of commercial growers, yet none of these had the R. ulmifolius I was looking for.

Eventually I did get one lead; for the USDA clonal repository in Oregon (US) – they have a germplasm repository and, lo and behold, you can order plants from there!

So I did.

Job done. Sometimes things turn out to be easier than expected if you have the right information. Or so I thought…

About the author:

Dan Mead is the 25th Anniversary Sequencing Project Coordinator, for the 25 Genomes Project for the Wellcome Sanger Institute, Cambridge.

More on the 25 Genomes Project:

25 Genomes Project web page 

Sanger Science

Sequencing a superbug: How typhoid became extensively drug-resistant

By: Gordon Dougan and Elizabeth Klemm, Wellcome Sanger Institute and the Department of Medicine, University of Cambridge
Date: 20:02:18

Reprinted from the Take on Typhoid website, www.takeontyphoid.org

The bacteria that causes typhoid fever, Salmonella Typhi, is a smart one.

I know this because our laboratory has been sequencing the DNA of S. Typhi strains that infect people around the world, and we have found evidence for an accelerating evolution of resistance to antibiotics.

After antibiotics were first introduced to treat typhoid in the 1940s, typhoid’s mortality rate plummeted from around 26 percent to just 1 percent. But within 20 years the first cases of typhoid resistant to chloramphenicol—one of the three first-line treatments for typhoid appeared signaling a battle between antibiotic and bacteria. Typhoid strains resistant to all three first-line treatments, which are known as multidrug-resistant (MDR) typhoid strains were quick to follow those resistant to only one antibiotic. And when doctors began using second-line antibiotics (more modern but expensive versions) such as fluoroquinolones, typhoid followed with resistance against those drugs, too.

A particular agressive strain (actually a genetic clone) of MDR typhoid, H58, first emerged in the 1990s. This H58 strain has grabbed our attention because, while other MDR typhoid strains have mostly remained in the local area where they first appeared, H58 has quickly spread across the globe. Currently, the majority of all global MDR typhoid strains can be classified as H58. It’s a quick learner that is able to not only evolve more easily, but also multiply and spread more rapidly than other typhoid strains.

The global prevalence of H58 typhoid strains, 2017

The global prevalence of H58 typhoid strains, 2017

Recently, the world saw yet another evolution of the H58 strain. In November 2016, doctors in Sindh, Pakistan, observed cases of a novel H58 S. Typhi strain that was resistant to not only the three first-line antibiotics and fluoroquinolones, but also a third-generation cephalosporin called ceftriaxone. This new strain is classified as extensively drug-resistant (XDR) typhoid. It is only susceptible to a limited number of antibiotics, which can be expensive and difficult to access, especially for low- and middle-income countries.

In an effort to learn more about this new XDR typhoid, our team, working closely with outstanding collegues in Pakistan, quickly went to work to sequence its DNA — research that was recently published in mBio. We found three concerning issues. First, we found that S. Typhi has the ability to transform from MDR to XDR in a single step. By acquiring just one highly mobile DNA molecule (plasmid) from another bacteria such as E. coli, MDR H58 typhoid in any location can potentially become XDR typhoid.

Second, we found that the new XDR strain is an end product of a global chain of antibiotic resistant bacteria. The plasmid that created XDR typhoid is present in a variety of diverse geographic settings across the globe, and once created, XDR typhoid rapidly reproduces itself. This is a concerning development because previous reports of XDR typhoid have been sporadic and isolated, while this particular strain has already caused large-scale outbreaks and is spreading within and outside Pakistan. It has already been carried as far as the United Kingdom.

Finally, our findings confirm the fact that the antibiotic arsenal for typhoid treatment is fading. We can no longer rely on antibiotics to treat typhoid fever. We need to shift our paradigm away from treatment and toward prevention.

Fortunately, we now have a promising new preventative tool. Typhoid conjugate vaccines are a newly WHO-prequalified class of typhoid vaccines that, compared to older typhoid vaccines, are longer-lasting, require fewer doses, and can be given to children as young as 6 months of age. Because they can be given to young children, countries can include typhoid conjugate vaccines in routine immunization programs, developing widespread immunity to typhoid and stopping dangerous strains like H58 from spreading and evolving. When implemented alongside improvements in water, sanitation, and hygiene, these vaccines can have the power to take on typhoid for good.

Typhoid may be smart, but we know how to outsmart it. We just have to act now.

This blog is reposted from Take on Typhoid website, www.takeontyphoid.org

About the Author:
Professor Gordon Dougan is a Group Leader at the Wellcome Sanger Institute and University of Cambridge Department of Medicine.

Elizabeth Klemm is a postdoctoral researcher in Gordon Dougan’s research group at the Wellcome Sanger Institute.

Related publication:
Elizabeth Klemm et al. (2018) Emergence of an extensively drug-resistant Salmonella enterica Serovar Typhi clone harboring a promiscuous plasmid encoding resistance to Fluoroquinolones and third-generation Cephalosporins. mBioDOI: 10.1128/mBio.00105-18

Further links:

About spiders (specifically the Fen Raft spider, Dolomedes plantarius) and where to get them from.
25 GenomesSanger Science

Getting a hold of samples… [part 2]

By: Dan Mead, the 25th Anniversary Sequencing Project Coordinator
Date: 16/02/2018

So far I’ve talked about Golden Eagle and Red Squirrel, also known by the moniker “charismatic megafauna” which a fantastic description of large cute/interesting things I first heard from Mark Blaxter.

So, I mentioned that some of the species are quite challenging to get but there are some that are also easy to sample (along with who provided them – thanks goes to them):

  • Himalayan Balsam – Lisa Outhwaite, found on the Genome Campus
  • Oxford Ragwort – Lisa Outhwaite, found on the Genome Campus
  • Summer Truffle – from Dr Paul Thomas, commercial source (the exact location is confidential though)
  • Common Starfish – from Prof Maurice Elphick, keeps a tank full for other ongoing work
  • King Scallop – Dr Susanne Williams, bought from a fishmongers!
  • Asian Hornet – Dr Seirian Sumner, already had a collection
  • Turtle Dove – Dr Jenny Dunn, had samples from previous work
  • Otter – Dr Frank Hailer, from routine health surveys
  • Roesel’s Bush-cricket – Dr Björn Beckmann, they’re quite abundant now so easy to find
  • Fen Raft Spider –  Dr Helen Smith, ditch maintenance means they ‘pop up’ at the time
  • Robin – Dr Jenny Dunn, had samples from previous work
  • Grey Squirrel – Kat Fingland, has samples from ongoing work

Although these were easy to get that doesn’t mean there aren’t some quite interesting anecdotes associated with the sample collection.

Summer Truffles

Summer truffles, for example, are pretty valuable (circa £400 per kilogram) so the reason we don’t have the exact location is to prevent rival hunters (?not sure you hunt for a truffle or forage?) from plundering the area.

King Scallop, Great Scallop, Coquilles Saint-Jacques

Also, imagine the confusion in the voice of the chap at the end of the phone when I ring up and ask the fishmonger if they have a GPS location for the source of their scallops. Then think what the guy must have been thinking when I try to explain why, hopefully he got it but I’m not so sure! This is why we need to reach out and explain science to the public more, there’s not a great deal of exposure to genomes/genetic research if it’s not human related.

Turns out they don’t know exactly where they came from anyway; the scallops hail from the Shetland Isles – might have to do some genotyping to find out!

Roesel’s Bush-Cricket

Crickets it turns out are quite the eaters and not wanting to limit their diet they are, like us, omnivorous. Unlike us, however, at least nowadays, they do practice cannibalism (not sure how you ‘practice’ mind you, maybe start with just a lick?!). It seems they can lose legs quite easily this way, one named Oscar had a run-in in their container with Hannibal and lost two legs, the third (Heather) just lost a single one prior to arrival.

Fen Raft Spider

Did you know you need a special license to collect Fen Raft spiders? This is because they’re red-listed like the Eagle but, thankfully for me, Helen has one. She has also raised many thousand spiderlings in her kitchen!

Check out her website (http://www.dolomedes.org.uk/) and if you fancy a challenge see how easy it is to spot (what is after all the largest UK spider) them in their habitat here.

Clearing a fen ditch - home to the Fen Raft Spider (one of the 25 Genomes we are sequencing)

Clearing a fen ditch – home to the Fen Raft Spider (one of the 25 Genomes we are sequencing)

Grey Squirrel

Grey Squirrels are regarded as a pest species. This means that it’s legal to hunt them without a special license, provided that you don’t cause any unnecessary suffering. We are NOT, however, doing this for the project as it’s not the most ethical thing when people are already collecting them for other research.

Also, did you know that you can buy squirrel pie? Not had it myself but could be tasty…

About the author:

Dan Mead is the 25th Anniversary Sequencing Project Coordinator, for the 25 Genomes Project for the Wellcome Sanger Institute, Cambridge.

More on the 25 Genomes Project:

25 Genomes Project web page 

Getting a hold of some samples… for the 25 Genomes Project
25 GenomesSanger LifeSanger Science

Getting a hold of some samples…

By: Dan Mead, the 25th Anniversary Sequencing Project Coordinator
Date: 29/01/2018

[Because gathering samples is proving to be quite a major task, I’m going to split this across several posts]

First things first – find a sample

The first, and often most difficult, part of getting a sample for the 25 genomes project is finding out where from.

There are a number of reasons for this but it essentially boils down to the fact that the Sanger Institute has always focused¹ on human health and disease so we don’t have a particularly great list of contacts for this project.

¹There have been some dalliances into other areas in the past, notably; Cod, Coelocanth (it’s fish, known as a ‘living fossil’, although I prefer something that implies it’s been a long-term success like ’Pan-eon species’, a description I may have made up), Tasmanian Devil Cancer, Tomato and a butterfly

The ones that are most difficult to get are the ones that the steering group decided upon independently, this is because without a scientist/researcher/expert putting forward the species there isn’t anywhere to start from.

This is where working in science has a great advantage- collaboration. In the fields of Agricultural, Plant & Animal and Environment/Ecology sciences half of all articles were written by multiple institutions by 2009² and if the trend has continued it should be over 60% by now.

²Gazni, A., Sugimoto, C. R. and Didegah, F. (2012), Mapping world scientific collaboration: Authors, institutions, and countries. J. Am. Soc. Inf. Sci., 63: 323–335. doi: 10.1002/asi.21688

This is one reason why we need to collaborate more and will be subject of a later post.

How traditional biologists and computer biologists work together. #CartoonYourScience by @redpen/blackpen https://twitter.com/redpenblackpen

How traditional biologists and computer biologists work together. #CartoonYourScience by @redpen/blackpen https://twitter.com/redpenblackpen

(for more like this check out the wonderful @redpenblackpen)

In practice this should mean that us scientists are a helpful bunch, and it turns out this is true. Whereas cold-calling/emailing people about the ‘accident you’ve been recently involved in’ or ‘the security breach on you Microsoft device’ are extremely annoying [pro-tip, pass the phone to your pre-school child if this happens, the results are normally quite amusing] doing the same to a scientist to offer them free sequencing of their species of interest is generally quite warmly received!

Getting a Golden Eagle(‘s DNA)

So lets’ have a closer look at some of the species, firstly the Golden Eagle.

I would have thought that this would be a tricky one – they’re protected by a bunch of laws/regulations which means that without special licences you can’t mess with them. In fact even the locations of the nests are a closely guarded secret as they are still being illegally killed or the eggs are taken by collectors.

Turns out that a quick google and one email can lead to a great result, although it’s tinged with a bit of sadness which I’ll get to in a bit. I initially contacted Professor Anna Meredith at Edinburgh University with a general ‘can you help me with blah, blah, blah’ as she works with a number of species we were interested in (in this case I was actually after Red Squirrels) and she forwarded this on to Dr. Rob Ogden, also at Edinburgh.

As it turns out he is already working on Golden Eagles and was planning on doing some sequencing with some collaborators in Japan (they have eagles there too). Even better he had samples already from (here’s the sad bit) chicks that had died in the nest (plus one found rather suspiciously in a long abandoned nest).

So, one sample down, 24 to go!

[By the way I’m not going to go into the logistics and ENORMOUS cost of shipping things on dry ice, just assume that things arrive magically, but I may expand on why they need shipping this way some other time.]

Something squirreled away

Anna couldn’t help out with the Red Squirrel however, so I asked the National Trust who maintain a lot of the areas where these cute little critters still live:

UK Squirrel Distribution Maps, 1945 and 2010. Image Credit: Craig Shuttleworth, RSST

UK Squirrel Distribution Maps, 1945 and 2010. Image Credit: Craig Shuttleworth, RSST

A nice lady called Laura put me in touch with the Head of Conservation (David Bullock) who in turn linked me to Andrew Brockbank at Formby Point who then led me to Kat Fingland (Nottingham Trent University) and Rachel Cripps (Red Squirrel Officer). All this took about a month and a bit but I finally had the right people. Thankfully we didn’t need any extra licencing to get some samples as they were already collecting from animals that had died from natural or accidental causes.

2 down, 23 to go!

Ethical and responsible sampling

It’s worth mentioning at this point that for this project we want to limit the impact of our sampling as much as possible and therefore have had it approved by our AWERB (Animal Welfare and Ethical Review Body). What this means is that wherever possible we do not kill any animals solely for the project, although in practice this is easier said than done and it does create some difficulties.

  1. For some animals this is not a problem as they are large enough that we can take a small amount of blood (less than 1ml) but others are too small for this to be possible (pipistrelle bats for example weigh around 5g and have only 0.5ml blood in total). This means that we need to get hold of whole animals AND as some of our species are protected (Golden Eagle, Red Squirrel etc.) they need to have already passed away for us to be able to use them.
  2. Another related issue is that the protected species need special licences to take blood samples from even if they are large enough for this to be possible. Given the amount of time for the project it’s not really an option, so again we need naturally passed on animals.
  3. The nature of the sequencing technology we’re using means that we need to get really long bits of DNA (upwards of 150,000 base pairs – that’s the A-T/G-C parts of DNA). The problem is that when we use animals that have died of natural causes we need to find and sample them really quickly: as soon as the animal dies the DNA begins to break up through the natural decomposition process.
  4. The really small critters (invertebrates like the Roesel’s Cricket for example) are next to impossible to find when they’ve died, as they tend to be eaten by other things and are hard to spot unless they move. In these cases we have no choice but to take live creatures and euthanise them as humanely as possible.
  5. Plants and fungi are somewhere in the middle, we need quite a lot of material (DNA extraction is more difficult), but ethically it’s acceptable to take bigger samples, so in these cases we take cuttings or fruiting bodies.

So that’s it for this one, more on sample collection to come…

About the author:

Dan Mead is the 25th Anniversary Sequencing Project Coordinator, for the 25 Genomes Project for the Wellcome Sanger Institute, Cambridge.

More on the 25 Genomes Project:

25 Genomes Project web page 

On choosing the 25 species for our 25 Genomes project
25 GenomesSanger Science

On choosing the 25 species for the 25 Genomes Project

By: Dan Mead, the 25th Anniversary Sequencing Project Coordinator
Date: 08/01/2018

For those that don’t know (and until recently I could include myself in this group) there are A LOT of species on and in the earth. Currently it’s estimated that there are 2 billion! (2,000,000,000; see http://www.journals.uchicago.edu/doi/10.1086/693564 for details). Most of these are bacteria, and we’re not looking at those for the 25 Genomes project, but this still leaves about 450 million to choose from.

To make it easier for ourselves, we also decided to limit ourselves only to the 1.5 MILLION species that have currently been described and catalogued. And, to help us along a bit more, we decided that only species found in the UK would count. According to the National Biodiversity Network, that brings the number down to ‘only’ 56,674. Now if you choose to only look at the local area surrounding the Sanger Institute then it’s a much more manageable 318.

However, it wasn’t going to be that easy. In the spirit of the Sanger’s inclusive approach to science, the Steering Group for the 25 Genomes project were concerned that such a narrow list was ‘too parochial’ and directed that the species sequenced should be a representative group of organisms from the whole of the UK.

So, how do you filter more than 56,000 species down to just 25?

The first thing to do was to break down the problem and the idea of a 5×5 matrix was mooted, discussed and agreed upon surprisingly quickly. Rather unsurprisingly coming up with five different categories was not as straightforward as it might first appear. While some were no-brainers (iconic species for instance), getting all five nailed down was tricky.

The wisdom of crowds

So we put out a call for suggestions to the whole Wellcome Genome Campus, to draw on the collective wisdom of the more than 2000 people who work here.

The results were, by turns, pleasing, odd, not-at-all-answering-the-question and esoteric. Here are some examples:

  • Species for which Britain has major global richness and conservation responsibility
  • Female emancipation in the wild
  • Unusual in terms of genetic load accumulation rate and mechanism
  • The three-toed sloth (which is neither a theme nor from the UK)
  • 25 local authors (and then we would really have 25 ‘novel’ genomes)
  • Species imported to the UK, which are making our lives healthier and happier (possibly a politically motivated suggestion)
  • What is ‘down there’ (in the detritus level down on the Ocean floor).

Finding five themes

Armed with these suggestions, the 25 Genomes Steering Group got back together to hammer out the final five categories. Here’s what we decided upon, reasoning that these themes should give a broad breadth of types of organism and habitats to sample:

5 Themes for the 25 Genomes Projects: Flourishing, Floundering, Cryptic, Iconic and Dangerous

5 Themes for the 25 Genomes Projects: Flourishing, Floundering, Cryptic, Iconic and Dangerous

Critical criteria

We also came up with a list of criteria that the species must meet:

  1. Scientific justification must be solid– are there good questions that can be answered by the genome sequence being made available?
  2. No decent draft sequence currently available
  3. Sample availability– some organisms are too small, others are too protected, while others are too seasonal for collection
  4. Tractable genome – some organisms have genomes that are incredibly complex and would take up too much time and resource. For example, many plants have cells that contain multiple copies of the same[ish] chromosomes, a phenomenon known as increased ploidy. (A hexaploid genome has SIX copies of each chromosome, and some plants have even more.)

Now there comes the hard part, actually getting the list of species. As mentioned in a previous post, our public engagement team suggested that we let the public decide five of the species, leaving us just 20.

Great you might think, as it means we don’t need to do as much work, but you’d be sadly mistaken. The reality was that I now needed a list of 20 to start collecting right away AND another 40+ that the public could vote on to decide the final five!

It’s who you know…

Rather splendidly we have a senior member of the Natural History Museum London on our steering group which meant we could exploit their contact list of some 400+ partner groups of wildlife experts. With this in mind I made a surveymonkey survey (it’s still about so you can check it out here, feel free to fill it in- you never know we might want to do more!) that, in my mind at least, cunningly hid the criteria in the questions. It also deliberately did not mention the themes so as not to steer people in any particular direction.

From this I got 99 responses (again discussed earlier) that made up most of the public vote and the 20* for getting on with, these latter ones are in the table below:

Cryptic Dangerous Floundering Flourishing Iconic
Brown Trout Indian Balsam Red Squirrel Grey Squirrel Golden Eagle
Common Pipistrelle King Scallop Water Vole Ringlet butterfly Blackberry
Carrington’s Featherwort New Zealand Flatworm Turtle Dove Roesel’s Bush-Cricket European Robin
Summer Truffle British Mosquito Northern February Red Stonefly Oxford Ragwort Orange-tailed Mining-bee

All in all, this took about 5 months to get to this stage as the species also needed to be individually reviewed to see if they met the criteria and then approved by the steering group.

Now the only problem is actually getting the species DNA; so collecting specimens and some lab work to follow, the supposed easy part….

More on this to come!

*Why we chose the above 20 species

Name
Why sequence it?
Summer Truffle There is disagreement in the literature as to whether this truffle is one or two separate species, plus it grows underground and is therefore largely unseen and difficult to locate. Prices for those collected in the UK remaining relatively stable at around 400GBP per kilo. Known as mycorrhizal, these fungi form a symbiotic association with a host plant on which they are dependent throughout their lifecycle. The sequencing of UK T. aestivum syn. uncinatum populations would be pivotal in helping to answer questions of modes of reproduction, life cycle questions as well as aiding in some core speciation questions.
Brown Trout The Brown Trout has three isoforms that differ in their migratory patterns, one form remains in the locality of its birth where it will live out its life, spawn and die. The second type migrates from lakes to streams and rivers to spawn but remains in fresh water. The third form migrates to the sea/ocean and remains there for much of its life, only returning to spawn. There appears to be no genetic difference between these forms, also known as anadromous (migratory) and sympatric (resident). Additionally the Wellcome Genome Campus is built around an 18th century red brick hall, Hinton Hall, also known as Trout Hall, where a carved stone trout is prominently displayed over the main door to the croquet lawns.
Carrington’s Featherwort This is selected as a representative of the liverworts, an ancient plant group predating flowering plants. It is one of the characteristic liverworts of very high rainfall areas in Scotland, and thus a representative of one of the very special groups of the British biota confined to such high-rainfall areas. Outside Scotland, it is only found in Ireland (extremely rare), the Faeroes and the Himalayas. The Scottish plants are apparently all male – like the Ents, the sexes have become separated in this species and the nearest females are in the Himalayas.
Common Pipistrelle Until recently this bat was believed to be a single species however it is now know to be a dual species (common/soprano), with one other (Nathusius’) also being resident in the UK. Studying the genome will allow us to investigate the origins of the split between the two species, when and why it occurred.
Indian Balsam Highly invasive weed species that substantial effort to control is undertaken, control methods based on finding would have important implications for wetland and river management.
King Scallop Pecten maximus has been found to contain the Amnesic Shellfish Poisoning toxin, domoic acid, which accumulates after they consume algae/diatoms- especially in the event of algal blooms. This risk is regarded as a significant threat to both public health and the shellfish industry. Some studies have suggested that global warming is resulting in greater reproductive success for P. maximus in the UK, however concerns have been raised over increasing mortality, declining recruitment and spawning stock biomass in several Scottish populations. Pecten maximus is also of interest scientifically because of its unusual vision and because its two shell valves are coloured differently. Identifying molecular pathways for shell pigment production in Mollusca has lagged behind studies of vertebrates and terrestrial invertebrates, and is a major gap in our understanding of how colour has evolved in the natural world. Vision in Mollusca is also of great interest because of the many different eye morphologies and the fact that very few species are thought to see in colour.
New Zealand Flatworm New Zealand flatworms prey on earthworms, posing a potential threat to native earthworm populations. Further spread could have an impact on wildlife species dependent on earthworms (e.g. Badgers, Moles) and could have a localised deleterious effect on soil structure.
British Mosquito Mosquitos are an important disease vector and there has been speculation that an increase in the distribution of other species due to climate change could allow the re-introduction of diseases such as malaria to the UK.
Red Squirrel Sequencing the whole genome of the native red squirrel will hopefully provide new tools and resources into reversing their decline and aiding their long-term conservation in the UK. For example, this research could reveal key insights into how red squirrels have adapted to living in an urban environment. This study could also provide further information for managing the spread of diseases and helping to protect the red squirrel from the fatal squirrelpox virus, as well as to gain a deeper understanding into the impact of newly-discovered diseases
Northern February Red Stonefly These stonefly only inhabit the purest of waters and as such are very limited in their habitats and may struggle to adapt to climate change. Brachyptera putata is an endemic UK stonefly. There has been suggestions that other European Brachyptera species may be synonyms of B. putata. Sequencing would determine whether it is a true UK endemic.
Turtle Dove Turtle Dove numbers have fallen by a staggering 93% since 1970 and now resides on the Global Red List for Endangered Species. Smaller than its collared cousin, the Turtle Dove is now only found in eastern England, where farmers are working with the RSPB to create feeding habitats, the destruction of which are blamed for the bird’s decline.
Water Vole The Water vole is the UK’s fastest declining mammal and efforts to help the population maintain genetic fitness would benefit from having the genome sequenced. Arvicola is a fantastic example of a small mammal genus that survived through the last glaciation, and has adapted to a range of habitats across Europe and much of northern Asia.
Oxford Ragwort The Oxford Ragwort is representative of a species being introduced and excelling in another habitat. It was collected from the slopes of Mount Vesuvius sometime in the 17th Century, and planted in Oxford where it rapidly colonised the area due to its natural hardiness, and could grow on urban landscapes too (sides of buildings, on stairs, etc.). When railways were introduced to the UK landscape, this facilitated the spread of Oxford Ragwort across the UK (it can be found growing along railway tracks today). Sequencing the genome would better increase our understanding of a non – native species excelling in a new habitat and may expand on our understanding of the ecology of flowering plants.
Roesel’s Bush-cricket Once restricted to the south coast and estuaries (saltmarshes) it is now widespread, possibly due to climate change and the spreading of salt on UK roads.
Ringlet butterfly Despite an overall decline in butterflies over the last 50 years the ringlet has increased its population by nearly 400%. It’s one of the few to fly on overcast days and has an interesting dwarf form that appears at 600ft, increasing until 100% of the population is this form at 1000ft.
Grey Squirrel As the anti-hero for the red squirrel, investigating how/why the squirrelpox virus is tolerated
Blackberry Good opportunity for citizen science, population genomics specifically for schools engagement. Also commercial soft-fruit genetics as it is an important and expanding food crop.
Golden Eagle This is an iconic UK species that has suffered from hunting and pesticide poisoning in the past, leading to extinction in all parts of the UK except Scotland where there are still less than 500 breeding pairs.
Orange-tailed Mining-bee This species is conspicuous and attractive, one of the mining bees that is more likely to have come to the attention of the general public. It is widespread and common throughout the United Kingdom, flying in spring. It is a component of natural pollination services which can ensure crop pollination in the absence of honeybees, and also the pollination of many wild and garden flowering plants ensuring their genetic diversity and conservation.  In the UK, of 276 species of bee, there is only one honey bee, and a score of bumblebees, the great majority of native bees are mining bees, including 68 species of Andrena.  The genome sequence itself will be useful for comparative study of the genomes of this solitary bee with the available genomes of social bees, in terms of gene composition relevant to sociality.
European robin Robins use vision-based magneto-reception and the mechanism is not fully understood, it has been shown that it may involve quantum entanglement. Robins are also extremely territorial, unlike most other song birds, with up to 10% of all deaths occurring due to fights.

About the author:

Dan Mead is the 25th Anniversary Sequencing Project Coordinator, for the 25 Genomes Project for the Wellcome Sanger Institute, Cambridge.

More on the 25 Genomes Project:

25 Genomes Project web page 

I'm a Scientist, Get me out of here - 25 Genomes
25 GenomesSanger Science

We let the public decide five of our species….

By: Dan Mead, the 25th Anniversary Sequencing Project Coordinator
Date: 20/12/2017

We recently wrapped up the ‘I’m a scientist, get me out of here’ public engagement event. This was a fantastic exercise aimed at getting the public, specifically school children, excited about sequencing genomes and science in general.

Here’s how ‘I’m a scientist, get me out of here’ worked – 25 Genomes style:

We divided the species into five themes, each of which had their own ‘zone’:

  • Flourishing (species on the up in the UK)
  • Floundering (endangered and declining species)
  • Cryptic (species that are out of sight or indistinguishable from others based on looks alone)
  • Iconic (quintessentially British species that we all recognise)
  • Dangerous (invasive and harmful species)

In each zone were between 7-9 candidate species that had been proposed via an online poll of scientists, wildlife experts and interested members of the public.

Close, but no cigar…

The poll to suggest candidate species for the public vote ran throughout September and into early November and we had a pretty good response. Most of the replies were pretty sensible, and quite a few had very detailed justifications by experts (one ran to nearly 5,000 words, complete with references). But some suggestions were rather left field.

In the very first section of our explanation of the purpose of the poll, we say: “…we are embarking on a brand new project to sequence a cross-sample of UK biodiversity.”

Bearing this in mind I suspect some people weren’t that keen on reading or were just chancing their arm. Here are some of the more ‘exotic’ suggestions:

  • Resplendent Quetzal – a cool-looking bird, with a cool name. If you’ve not heard of it that’s because it lives in central America (not the UK).
  • The “Hoff” crabKiwa tyleri – so named because of its hairy chest, reminiscent of Baywatch actor David Hasselhoff. The species can be found in UK oversees territorial waters, but it’s not in the UK.
  • Fire Salamander – yet another cool name, and it looks pretty sweet too. Unfortunately only found in mainland Europe.
Fire Salamander - pretty, but not UK-based. Image Credit: William Warby, Wikimedia Commons

Fire Salamander – pretty, but not UK-based. Image Credit: William Warby, Wikimedia Commons

Some non-UK resident species suggestions were a little easier to spot:

  • Greenland Shark
  • Mongolian Gerbil
  • Madagascar Paradise Flycatcher
  • Asiatic black bear
  • Italian Mediterranean buffalo
  • Alpine grasshopper
  • Tasmanian Devil (funnily enough, this species has already been sequenced right here at the Sanger Institute.)
  • Antarctic Krill

Back to I’m a scientist, get me out of here – 25 Genomes

The idea for the zones was that each species would be represented by a ‘champion’ (or team thereof) and they would answer in the first person, to keep things more fun and relatable. It worked well:

Screenshot of I'm a Scientist, Get me out of here - 25 Genomes online chat

Screenshot of I’m a Scientist, Get me out of here – 25 Genomes online chat

During the ‘I’m a scientist get me out of here – 25 Genomes’ event was running anyone who logged on could vote for their favourite species, one vote per zone. When the vote was finished, the winning species from each zone was added to the 25 Genomes project.

Getting engaged with the students was the most successful way of winning. In all the zones the species that were among the top two most active in the live chats and answered more questions on average had a much better chance becoming the zone winner.

The winners!

The winning 5 species of the public vote for the 25 Genomes Project

The winning 5 species of the public vote for the 25 Genomes Project – Common Starfish, Asian Hornet, Eurasian Otter, Fen Raft Spider, Lesser Spotted Catfish

In all around 5,000 people participated in the events and there were over 150,000 page views, which sounds pretty successful to me.

One final invaluable piece of information that I learned from this whole process is that the Latin name (Onopordum acanthium) for Scotch Thistle is “donkey fart thistle”. In ye olden times people used to think that donkeys fart a lot if they eat it*.

*from the iconic zone Q&A

About the author:

Dan Mead is the 25th Anniversary Sequencing Project Coordinator, for the 25 Genomes Project for the Wellcome Sanger Institute, Cambridge.

More on the 25 Genomes Project:

25 Genomes Project web page 

Sanger Science

Big data for mosquito control

By: Alistair Miles
Date: 20:12:17

Mosquito-collection-credit-Beniamino-Caputo_blog

Collecting mosquitoes via pyrethrum spray catch in The Gambia. Credit: Beniamino Caputo

Recently, Erica McAlister from the Natural History Museum in London posted a beautiful image of a specimen of Anopheles gambiae – the mosquito species that contributes most to malaria transmission in Africa – from the museum’s collection. It turns out this is probably the original type specimen, collected in 1900 by a zoologist called John Samuel Budgett during an expedition to The Gambia. In one of history’s tragic ironies, Budgett died a few years later from malaria, contracted while on another expedition.

Despite more than a century of study, there is still so much we don’t know about this deadly mosquito species. Basic questions about life history and ecology, such as how do they survive the dry season, do they migrate, and if so, how far can they travel. Questions about evolutionary history, such as why have they diversified into a cryptic species complex. Practical questions, like how is insecticide resistance spreading, and what can we do about it.

Female-mosquitos-credit-Martin-Donnelly_blog

A sample of female Anopheles gambiae mosquitoes. Credit: Martin Donnelly

If Anopheles gambiae did not transmit malaria, we could spend a very fulfilling academic career carefully unravelling the answers. Unfortunately we don’t have that luxury. The global campaign to eliminate malaria has made great strides over the last decade, but malaria remains a major disease burden in many parts of Africa and there is still a long way to go. And we are almost completely dependent on insecticide-based methods of mosquito control. We don’t yet know how badly insecticide resistance will impact on current control programmes, and there is a lot of debate and uncertainty about what should happen next. But most people agree that we cannot continue trusting blindly that the same insecticides we’ve been using for decades will continue to be effective.

One way to overcome uncertainty is to collect data. Lots of data. And that’s fundamentally what the Anopheles gambiae 1000 Genomes Project is about. In the first phase of the project, reported recently, we sequenced the genomes of 765 mosquitoes collected from field sites in 8 African countries. We then compared the sequences and discovered more than 52 million genetic variants. These data on genetic variation in natural mosquito populations can serve a range of purposes. For example, they can be used to study the evolution and spread of insecticide resistance, and inform the design of new mosquito control technologies based on gene drive. They also give us insights into the structure, size and history of mosquito populations, and provide evidence for substantial migration between populations. When we looked at some of the most rapidly evolving insecticide resistance genes, we found dramatic demonstrations of how mosquito populations across the continent are inter-connected, enabling resistance mutations to spread over thousands of kilometres.

Mosquito-bednets-credit-Martin-Donnelly_blog

Distribution of insecticide-treated bed-nets for malaria control. Credit: Martin Donnelly

Responding to insecticide resistance is a complex challenge, and there are no easy answers. Resources are finite, and need to be allocated wisely. If we want to bridge the gap between research and practice, we will need to collect more data to fill in the geographical gaps, and study how mosquito populations are changing over time. We also need to collect data from mosquito populations before, during and after specific control interventions, so we can measure the impact and learn which interventions are most (or least) effective. But the data we have collected so far demonstrate a clear path forward. By continuing to build a public resource of mosquito genome sequence data, and integrating with other data on ecology, malaria epidemiology and insecticide resistance phenotype, we hope to provide a source of much-needed intelligence to support the malaria elimination campaign in Africa.

This blog is reposted from Nature Ecology and Evolution – Behind the paper

About the Author:
Alistair Miles is Head of Epidemiological Informatics in the group of Dominic Kwiatkowski, at the University of Oxford, and the Wellcome Trust Sanger Institute.

 Related publication:
The Anopheles gambiae 1000 Genomes Consortium. Genetic diversity of the African malaria vector Anopheles gambiae. (2017) Nature DOI: 10.1038/nature24995

Further links: