Tag: ensembl

Experimental design and population data. Credit: Carneiro et al., Science/AAAS DOI: 10.1126/science.1253714
Sanger Science

How do animals become domesticated?

08 September 2014
By Bronwen Aken

Domestication is driven by small changes in many genes. Figure from companion paper,  '<a href=http://www.sciencemag.org/content/345/6200/1000/F1.expansion.html>On the origin of Peter Rabbit</a>'. Credit:  P. Huey/Science

Domestication is driven by small changes in many genes. Figure from companion paper, ‘On the origin of Peter Rabbit‘. Credit: P. Huey/Science

The domestication of plants and animals many thousands of years ago revolutionised human societies and changed the course of history.

Have you ever wondered how this domestication occurs? What changes happen at a genetic level to make one animal tamer than another?

Rabbits were domesticated only 1,400 years ago at monasteries in southern France. They’re an excellent model for studying domestication because we know when and where they were domesticated, and also because wild populations still exist in the region.

I’m a bioinformatician at the EMBL-European Bioinformatics Institute (EBI) working as part of the Ensembl team. My team and I collaborated with researchers to understand what genetic changes took place when wild rabbits were domesticated.

The results of the consortium’s research, published in Science, show that rabbit domestication occurred due to small changes in many genes, and not due to large changes in a few genes.

These small changes, known as genetic variants, have altered frequencies in domesticated rabbits when compared to wild populations. Many of these genes are involved in the development of the brain and nervous system, which may explain the behavioural changes that we see in domestic rabbits such as a weaker flight response.

In addition, we observed more genetic changes to the genome in regions that do not code for proteins. This finding is particularly interesting in light of projects such as the ENCyclopedia of DNA Elements (ENCODE), which have collected copious data illuminating how regions of the genome outside of protein-coding genes play a vital role in gene regulation. This non-coding genetic variation could therefore control which genes are switched on or off or act like a volume control to adjust the level to which a gene is switched on or expressed.

Comparing wild and domesticated rabbits

First, DNA from one domestic rabbit was sequenced to build a high-quality reference genome assembly. Having a reference genome assembly is a critical first step in any genome analysis because it provides a basis against which other genetic data can be compared.

Next, the genomes of other rabbits were sequenced: six domestic breeds, and wild rabbits from 14 different places too. This provided a fantastic opportunity for comparing the genetic changes that occurred when wild rabbits were domesticated because genetic changes that are more or less frequently observed in the domesticated breeds compared to wild populations are more likely to contribute to domestication.

Ensembl’s role

In addition to having a reference genome assembly, researchers also need a well-characterised gene set. This was the part of the project that I was involved with. I co-managed the Ensembl gene annotation project for the rabbit.

Magali Ruffier in the Ensembl Genebuild team predicted the location of protein-coding genes for the rabbit by mapping known protein sequences from rabbit other animals to the rabbit genome. This is a computationally expensive job and can take several months to complete. Daniel Barrell also mapped the consortium’s gene expression (transcriptome) data from ten tissues to the rabbit genome and incorporated these data into the gene annotation project.

Future directions

The genomic resources collected during this research provide a good basis for further study, including research into the early stages of species formation. Contrary to what you might expect, we did not find evidence to support the idea that domestication is driven by the inactivation of a small number of genes. Instead, the small changes (variations) that we observed in many genes were already present in the wild populations and have merely had their frequency altered in domestic populations. This means that rabbits may also be a good model for studying what happens when domestic and wild populations breed.

Bronwen Aken is the Primary Analysis Coordinator for the Ensembl project at EMBl-EBI. Ensembl is a genomic interpretation system, providing genome annotation, querying tools and access methods for chordates and key model organisms.


  • Carneiro, M et al (2014). Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication. Science. DOI: 10.1126/science.1253714
  • The ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature. DOI:10.1038/nature11247
  • Carneiro, M et al (2014). The Genomic Architecture of Population Divergence between Subspecies of the European Rabbit. PlOS Genetics. DOI: 10.1371/journal.pgen.1003519

Caption and credit for homepage image: Experimental design and population data. Credit: Carneiro et al., Science/AAAS DOI: 10.1126/science.1253714

Related Links:

How researchers cracked the secrets of the turtle’s shell and answered some of evolution’s most intriguing puzzles
Sanger Science

Cracking the secrets of the turtle’s shell

Turtles are unusual for animals with backbones? They are one of only three vertebrate species that have shells. Scientists have used genetics to discover when and how the shell appears in the developing turtle

Turtles are unusual for animals with backbones. They are one of only three vertebrate species that have shells. Scientists have used genetics to discover when and how the shell appears in the developing turtle [Image: Clunio, Wikimedia Commons]

10 June 2013

By Bronwen Aken

Did you know that turtles are unusual for animals with backbones? They are one of only three vertebrate species that have shells.

But where does the shell come from? When does it appear on a growing turtle embryo? And which animals are turtles most closely related to?

The answer lies in the turtles’ DNA and genes.

I’m a bioinformatician at the Wellcome Trust Sanger Institute working as part of the Ensembl team. My team and I collaborated with researchers from RIKEN (The Institute of Physical and Chemical Research, Japan) and BGI (formerly the Beijing Genomics Institute, China) in the Joint International Turtle Genome Consortium to understand when the Chinese softshell turtle appeared in evolutionary history and how its shell develops.

The results of the consortium’s research, published in Nature Genetics, show that a turtle’s shell is a late starter in development terms. Turtle embryos develop in a very similar way to most other vertebrates, it is only later on that the turtle-specific pathways are activated to grow the shell. We were also able to place Chinese softshell turtles on the evolutionary tree and support the theory that turtles belong to a branch that includes birds, crocodiles, and dinosaurs.

Which came first: the chicken, the turtle or the largest extinction event on the planet?

We sequenced the DNA from two turtle species so that we could understand which animals are most closely related to turtles. The more closely related two species are, the more similar the sequence of their genes will be.

The BGI team pieced together the full genomes of animals from the thousands of short lines of DNA code that modern DNA sequencing techniques produce. This painstaking work of assembling genomes can be frustrating – sometimes it extremely hard to be certain which short fragment of DNA overlaps with another fragment, but the final results can be illuminating.

Using the resulting turtle genomes, the teams at Ensembl and BGI were able to search for protein coding genes. This was the part of the project that I was involved with. I managed the Ensembl gene annotation project for the turtle.

We predicted the location of protein-coding genes for the turtle by mapping known protein sequences from other animals to the turtle genome. This is a computationally expensive job and can take several months to complete. We also mapped the consortium’s gene expression (transcriptome) data to the turtle genome and incorporated these data into the gene annotation project.

Turtle protein-coding genes were then compared with genes from chickens, crocodiles, lizards, dogs and frogs. By comparing how similar the sequences are, we can estimate how closely related the different species are to each other. We found that the turtle lineage split away from the bird-crocodile lineage about 250 million years ago.

This date is the time of the Permian-Triassic extinction event: the largest known extinction event ever to take place on this planet. Could the emergence of turtles be related to this extinction event, which was particularly catastrophic for marine species?

Shell development

Turtles are unusual in the animal kingdom because they are born with a shell that houses their soft body. This shell is made from modifying the ribs and backbone, so they cannot crawl out of their shell. (Did you know that a turtle’s shoulder blades end up inside the shell and not outside of the ribs, as you would see in other vertebrates?)

When all embryos develop, specific genes are switched on and off in a highly regulated fashion to grow the various parts of the body. Considering that turtles and chickens have a common ancestor, at what point does a turtle embryo’s development take a different path from that of a chicken embryo? We are able to explore this from a gene expression perspective by studying the activation levels of genes – known as the transcriptome.

By using the latest technology – next-generation transcriptome sequencing – we found that the turtle and chicken embryo gene expression patterns start to differ after the vertebrate phylotypic period of development. This is the moment just before the shell starts to form.

We looked at which genes are more highly expressed in turtle than chicken after the vertebrate phylotypic period, and where in the body these genes are switched on. We found evidence that the turtle embryo forms a shell by co-opting a gene expression pathway that is usually used to grow limbs.

Sniffing out an unexpected finding

Our study also showed that turtles may have the ability to smell a wide variety of substances because they possess a very large number of olfactory receptors. We identified more than 1000 olfactory receptors in the Chinese soft-shell turtle, which is one of the largest numbers ever to be found in a non-mammalian vertebrate!

While my role in this project was small, I really enjoyed the work.  It was great to be a part of an enthusiastic and international consortium spread across eleven time zones including the UK, Germany, China and Japan. Luckily, conference call scheduling was made easy because our collaborators in Japan and China were willing to stay up late until the UK crowd got in to work.

Bronwen Aken is the Wellcome Trust Sanger Institute Ensembl Genebuild Team Project Leader… more

Research paper:

Wang Z, Pascual-Anaya JP, Zadissa A et al. The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan. Nature Genetics 2013; 45: 701-706.

Doi: 10.1038/ng.2615

Related Links

Sanger Science

Knowing zebrafish, knowing you – understanding zebrafish genomes, unlocking human health

Zebrafish are an ideal model organism for modelling the effects of genes on human health and disease. Credit: Genome Research Limited

30 July 2012

Written by Simon White

Studying zebrafish is a vitally important way to discover what role genes play in human health and disease. By linking human genes to their closest comparators in the zebrafish genome (orthologs) we are able to investigate what happens when a gene is lost or isn’t working the way it should.

However, to be able to do this, we need to have as complete as a list of zebrafish genes as possible. I work in a team that uses RNA-seq technology to automatically annotate genes and to determine the different ways genes work in different tissues (tissue-specific splice variation). RNA-Seq looks for the molecules (transcripts) that genes produce to make the proteins that control how a cell works. This information is invaluable in helping us to construct a complete catalogue of transcripts from the zebrafish genome.

The potential biological insights offered by RNA-Seq are considerable, but at considerable cost in terms of the effort to understand the large volume of information it produces. Because of the usefulness of this information, we set ourselves the goals of constructing a zebrafish gene set based on RNA-Seq alone and then identifying the highest quality models. This information would then be added into the core Ensembl gene-set (the reference zebrafish genome used by researchers around the world) in such a way as to increase the quantity and quality of the gene-set without adding artifacts or pseudogenes.

The results of our efforts were published recently in a paper entitled Incorporating RNA-seq data into the Zebrafish Ensembl Gene Build in Genome Research.

The task of assembling RNA-Seq short reads into gene models is not trivial; in particular, we needed to overcome the issues of transcript contiguity and fragmentation. To achieve this, we employed the following approach.

We used Illumina paired-end sequencing to deep sequence a range of developmental stages and adult tissues, providing near complete coverage of the zebrafish transcriptome. We also performed an RNA-Seq three prime pull-down experiment that allowed us to identify the precise three prime ends of models.

In order to use these data sets to build gene models we created an analysis pipeline consisting of five steps:

  • alignment to the genome
  • processing alignments to construct basic transcript models
  • re-alignment of reads to basic transcripts to identify splice sites
  • refining basic transcripts using splice data to produce final transcripts
  • using pull-down data to modify the three prime ends of the transcripts.

In total, we compared a sample set of 8,822 cDNAs to the RNA-Seq models and found that 95 per cent of the cDNA introns were present in our RNA-Seq set. We also found that 83 per cent of the cDNAs were reproduced perfectly in the RNA-Seq set at the level of the coding sequence of the transcript (the bit that actually codes for protein).

Many of the RNA-Seq generated models appeared to be fragments and we needed to remove them by filtering our results before we could include our findings in the Ensembl gene set. However, despite these fragments, we were able to create a significant number of full-length transcript models and 8,374 of these were added to the core Ensembl gene-set. In addition to this we generated a wealth of tissue-specific splice variation data.

By improving the quality and coverage of the zebrafish gene annotation we have provided a useful resource for researchers who wish to verify the activity of genes implicated in human disease. In addition, we have generated a high-quality RNA-Seq gene annotation pipeline that is now routinely used in Ensembl annotation and is proving particularly useful for species with very little protein or cDNA evidence.

In addition, the significant number (>1000) novel models that came from RNA-Seq that were absent from the zebrafish cDNAs suggests that the deep sequencing offered by RNA-Seq can be used to expand the gene annotation of even well-studied model organisms.

We hope that our approach will help refine the use of RNA-Seq in the Ensembl gene build process for new species. It also gives us the opportunity to rapidly update old gene-sets for which there is unlikely to be a full gene-build in the foreseeable future.

Simon White works in the Ensembl Genebuild team at the Institute, where he develops and runs pipelines for automated genome annotation… more

Related Links: