Bioinformaticians need to make many types of software do a variety of different things, often all at once, in order to help them analyse huge amounts of biological data. They do this by writing code, which they use to organise different softwares into ‘pipelines’ or ‘workflows’. Instead of having to operate each software manually, these pipelines mean each software automatically feeds into the next. This saves time and effort, and means research can be more ambitious in scale.
However, problems can arise when you try to move a pipeline. For example, maybe you developed a pipeline in the cloud but now want to use it on your laptop. Or you would like to share your cleverly coded pipeline with a colleague who uses a different operating system. This can be a headache because different systems all talk in slightly different coding languages.
This is where Nextflow comes in. Nextflow is an example of what developers call a workflow framework or workflow manager. Essentially it allows you to be a bit more nimble with how and where you use pipelines.
Nextflow does several things bioinformaticians really like. It is portable, meaning it can be moved easily between different systems. It is agnostic in its language, speaking its own language which is compatible with many different systems. It is modular, which allows users to add and remove different softwares at any point in the pipeline without causing problems elsewhere. And it is scalable, which allows pipelines using Nextflow to take on lots of additional workload demands without having to be constantly modified.
An increasing workload is not unusual in genomics. The Darwin Tree of Life project, on which both Sanger and EMBL-EBI collaborate, is a great example. The research goal is to generate and annotate genomes for all eukaryotic organisms living in Britain and Ireland, perhaps 70,000 species. At the beginning of the project, workloads are expected to be light as scientists focus on experimenting with the techniques needed to sequence such a diverse range of lifeforms. As these skills are mastered, however, the expectation is that hundreds or even thousands of genomes will be assembled over relatively short periods of time. Bioinformaticians need to be able to scale up their efforts easily and repeatedly over the course of this decade-long project.