Image credit: Luke Lythgoe / Wellcome Sanger Institute

Categories: Sanger Science6 April 20238.9 min read

Building a bioinformatics community to tackle genomic questions at scale

Over 60 bioinformaticians and developers from across the Sanger Institute and EMBL’s European Bioinformatics Institute (EMBL-EBI) came together on 27-29 March to take part in the nf-core hackathon. This biannual global event is part of a project building a worldwide community of people using Nextflow. This informatics tool helps scientists analysing genomes to more easily share and build upon each other’s work and ideas.

Sign up for our monthly email update

Sign up

The event at the Wellcome Genome Campus was the largest gathering at any site during the international nf-core hackathon. The purpose is for scientists to tackle problems collaboratively and share best practices. Over the three-day period, in-person events were held in 12 countries across four continents, with many more participants online.

Building a Campus community

The nf-core project is a diverse global community of developers. It promotes open science and collaboration via contributions and feedback from the scientific community. But while the project supports the community on a global scale, the organisers of the Wellcome Genome Campus gathering are equally keen to build a more modest Nextflow user community right here.

The teams brought together during the nf-core hackathon are involved in a wide array of biological research. Within the Sanger Institute bioinformaticians are studying a huge variety of genomic data. This includes researching the human genome, cancer and ageing, and how human cells function. Others are monitoring infectious diseases, or looking at the genomes of parasites and microbes, as well as those of every species of plant, animal and fungus in the natural world.

Teams joining the nf-core hackathon from EMBL-EBI are working on a similarly diverse range of projects. One team manages a database which helps predict genetic predisposition to different inherited diseases. Another is annotating the genomes of the different species generated at Sanger, making sense of the data and what each part of the genome actually does. Yet more teams are responsible for storing genome data on international archives, ensuring it can be easily and openly accessed, and creating metadata standards.

A common thread running through all these teams is how they benefit from Nextflow and nf-core.

“Reproducible research and analysis is a foundational principle of the scientific method and as such we should always work towards adopting tools that help us to achieve this. Nextflow and nf-core provide a useful way to achieve this and share our analyses and workflows with other researchers around the world,” says Andrew Yates, Team Leader in Genomics Technology Infrastructure at EMBL-EBI. “This not only accelerates science but allows introspection of our methods. Supporting this growing community is the right thing to do as we work towards making the ecosystem around workflows better.”

“Reproducible research and analysis is a foundational principle of the scientific method and as such we should always work towards adopting tools that help us to achieve this... Supporting this growing community is the right thing to do as we work towards making the ecosystem around workflows better.”

Andrew Yates,
Team Leader in Genomics Technology Infrastructure at EMBL-EBI

What is Nextflow and why do scientists like it?

Bioinformaticians need to make many types of software do a variety of different things, often all at once, in order to help them analyse huge amounts of biological data. They do this by writing code, which they use to organise different softwares into ‘pipelines’ or ‘workflows’. Instead of having to operate each software manually, these pipelines mean each software automatically feeds into the next. This saves time and effort, and means research can be more ambitious in scale.

However, problems can arise when you try to move a pipeline. For example, maybe you developed a pipeline in the cloud but now want to use it on your laptop. Or you would like to share your cleverly coded pipeline with a colleague who uses a different operating system. This can be a headache because different systems all talk in slightly different coding languages.

This is where Nextflow comes in. Nextflow is an example of what developers call a workflow framework or workflow manager. Essentially it allows you to be a bit more nimble with how and where you use pipelines.

Nextflow does several things bioinformaticians really like. It is portable, meaning it can be moved easily between different systems. It is agnostic in its language, speaking its own language which is compatible with many different systems. It is modular, which allows users to add and remove different softwares at any point in the pipeline without causing problems elsewhere. And it is scalable, which allows pipelines using Nextflow to take on lots of additional workload demands without having to be constantly modified.

An increasing workload is not unusual in genomics. The Darwin Tree of Life project, on which both Sanger and EMBL-EBI collaborate, is a great example. The research goal is to generate and annotate genomes for all eukaryotic organisms living in Britain and Ireland, perhaps 70,000 species. At the beginning of the project, workloads are expected to be light as scientists focus on experimenting with the techniques needed to sequence such a diverse range of lifeforms. As these skills are mastered, however, the expectation is that hundreds or even thousands of genomes will be assembled over relatively short periods of time. Bioinformaticians need to be able to scale up their efforts easily and repeatedly over the course of this decade-long project.

"Our goal for this hackathon is to build a long-lasting campus community of developers of pipelines, associated tools and informatics infrastructure, as well as Nextflow users. It was amazing to see so many colleagues, drawn from different institutes and scientific programmes, coming together to share ideas. You can feel a real momentum building and strong spirit of collaboration."

Priyanka Surana
Senior bioinformatician at the Sanger Institute, and a driving force behind the nf-core hackathon event on the Wellcome Genome Campus

Pizza and pipelines

“As bioinformatics continues to evolve quickly we recognise how essential communities, such as this one, are,” explains John Boyle, Associate Director of Science Solutions at the Wellcome Sanger Institute. “This community provides the means for us to share and advance knowledge with our peers. It also provides a forum for us to understand best practice, so that we can improve how we use workflow tools to rapidly put together state of the art analysis pipelines, which are required for us to undertake cutting edge science.”

The nf-core hackathon happens at least twice a year, teams at Sanger and EMBL-EBI are enthusiastic about bringing their community together on an even more regular basis. This new workflow community is already planning more hackathons, training on advanced topics, talks and discussion sessions, and after-work meetups with pizza and drinks.

The range of research across the Wellcome Genome Campus is vast, but it all has one thing in common: this is genomics on a massive scale. That requires efficient informatics that can be ramped up easily and rapidly. Tools like Nextflow allow our bioinformaticians to do this, but a strong community is what will drive innovation into the future.

“As bioinformatics continues to evolve quickly we recognise how essential communities, such as this one, are to share and advance knowledge with our peers.”

John Boyle,
Associate Director of Science Solutions at the Wellcome Sanger Institute

Find out more