The race to sequence SARS-CoV-2

While most science at the Sanger Institute was shutting down in March following the UK government’s response to COVID-19, several teams were preparing to work on coronavirus. The pivot from business-as-usual genome sequencing, to sequencing SARS-CoV-2 viral genomes, was quick.

The Sanger Institute is built for high-throughput science – but the project to sequence coronavirus demanded more samples were handled more quickly than ever before. In the space of just a few weeks, new processes and pipelines were built, tested and implemented, and the data started flowing. We spoke to two of the Institute’s software developers about their time in lockdown, racing to develop software that could cope with the unprecedented numbers.

Steve Inglis (top middle), Katy Taylor (top right), and the Production Software team.

By Alison Cranage, Science Writer at the Wellcome Sanger Institute

High throughput science needs high-throughput computing. The DNA pipelines production software development team is well practiced, with some of the largest genome sequencing projects in the world up and running at the Institute – projects like UK Biobank, the Human Cell Atlas and the Cancer Genome Project. But software for these big data projects was carefully developed, tested, built and scaled up over time, a luxury that didn’t exist for sequencing SARS-CoV-2 genomes.

Steve Inglis is a Senior Software Developer. “I do remember thinking that working from home after the Campus closed might bring a nice change in pace, as much of the science was shutting down, we might have less to do. But a few days before we were due to leave site, we were told we were working on a new project to sequence coronavirus samples. We weren’t going to be slowing down.”

Together with other institutes across the UK, the Sanger Institute joined the COVID-19 Genomics UK Consortium – COG-UK. The group is sequencing the genomes of virus samples, enabling researchers to follow the evolution and spread of coronavirus. The viral genome data is combined with clinical and epidemiological datasets to help guide UK public health interventions and policies.

Katy Taylor is also a software developer in the team that supports DNA pipelines at the Sanger Institute. “In the early days of the project, the biggest challenge was trying to pin down what the first version of the software would do, when they were still trying to design a process in the first place. So we were doing it very much concurrently. It is normal for requirements for software to change, but the process we were trying to build software for has been constantly changing. Even now they are still making improvements. This is good for output – we can do higher throughput and we are prepared if there’s a second wave – but it means we’ve gone through a lot of iterations of the software already.”

For Steve, the challenges of the project are those shared by many during the pandemic. “From a personal perspective – though I think everybody’s the same – being in lockdown was hard enough, but being in lockdown under a massive amount of pressure to deliver something in probably half the time we would normally be expected to deliver it was tough. Especially early on. But everyone was in a similar place. Not seeing people in person meant it was harder to work things out than it would normally have been. Just getting through the day sometimes, was hard. But we started living life on platforms like zoom and slack, and found new ways to communicate and collaborate, probably with a wider group of people than we ever have before.”

Physical to digital

A key part of the team’s job is to securely keep track of the thousands of samples as they flow through the Institute. Samples of DNA or RNA for sequencing start off as a physical entity, usually invisible to the naked eye, suspended in a tiny volume of liquid at the bottom of a tube or in one of hundreds of wells sunk into a small rectangle of plastic. Coronavirus samples are no different.

The samples arrive frozen in 96-welled plastic plates. The difference is the volume – at one point there were 20 boxes a day, each containing 80 plates. Depending on where the sample is from, it undergoes different processing. Samples are received by laboratory teams who enter their tracking numbers into laboratory management systems, then inactivate them and prepare them for sequencing. Steps include replicating the virus’s genetic code, adding a molecular tag to each sample, pooling them and loading them ready for the sequencing machines.

Read by the sequencing machines, the samples then become digital versions of themselves. The data are transferred to a temporary storage area and quality control processes and primary analysis algorithms are run. Finally, the data are automatically transferred to a centralised portal for analysis by scientists in COG-UK. Every single step requires software.

Illustrations by Petra Korlevic

“When the DNA sequencing team first came to us about the new pipeline for coronavirus samples, they initially presented it to us as something that we had all the parts for already. There was a diagram of our existing pipelines and we thought maybe you can go along this route and then pop over to this pipeline and carry on. It was a nice idea but didn’t quite work out that way. We did use existing software, but there were a few extras. We had to figure a new route. We had to build new software to cope,” said Katy.

One of the new pieces of software meant that laboratory staff didn’t have to scan the barcode on every plate in every box, whilst standing in the -20°C freezer. “We were massively reducing the person-hours needed to get the job done. Automating as much of the process as possible has been crucial. And actually preventing frostbite,” Steve said.

“One of the interesting things that has come out of this is that we probably implemented some new features which will be used after the project. We’ve introduced automation into parts of our other pipelines to make life a lot easier for the users, and projects more efficient. Because we weren’t able to cope using the existing software, we had to repurpose it to make it faster and better,” added Katy. 

Others across the Institute agree: “We’ve been extremely excited by a recently completed piece of work delivered by the team. It is a game changer for our COVID sequencing work and also for our wider research sequencing processes,” said Emma Gray, a team leader in DNA pipelines.

“Over the past few months the work they have done on the COVID pipeline has been so valuable and hugely appreciated by the operational teams. The speed at which they have brought in new functionality really has been a game changer in this project – which is very fluid. There has been no additional difficulty in communicating what we need with the team working remotely – which is a significant accomplishment in itself. Most noteworthy functionality for me has been the ‘Sential cherrypicking workflow’ which has resulted in significant time savings. Work that would have taken two people a day to complete can now be done in under two hours by one person,” Emma added.

A changed team

The whole team of 13 software developers shifted to work on the new COVID sequencing pipeline back in March.

Katy reflected: “I was working on a few different projects before – everyone was. There were a lot of feature and process improvements. This has been a much more coherent project. We’re all working on the one same thing. It’s very focused.”

 “As a team not only have we survived but we thrived. We had to – there was no choice,” Steve laughs. “We haven’t really worked as closely together as a team like that before.”

“I think the other thing is that it’s changed how things for us work, and how we will work in future. Before, I would say our processes were quite rigid. But now we’ve had to throw all that out of the window to cope. And I think that works. We might carry on doing that.”

Despite the stresses, both Steve and Katy agree that working on the project has been rewarding.

Katy said: “The best bit for me was at the start of the project, even though it was probably the most stressful part, it was also quite exciting. We built a pipeline really, really quickly. I think we got the first version out in two weeks. Designing that process in the first place was quite interesting.

“Also I think we all felt like our work was contributing. A lot of people’s work was stopping due to lockdown. But it was good to have lots of work to do, and for it to be relevant to the crisis.”

Steve agreed: “Realising that I was part of something that was so important was good. I’ve always felt like being at Sanger our work is quite important. To be part of something that could make a real difference to the global pandemic I think was quite satisfying and scary and exciting all at the same time, especially when we were finishing things. I remember thinking yes this is actually going to be used, and it might help.”