Categories: Sanger Science16 October 2025

“I’m not trying to do science just for scientists. I’m doing it to try to solve important problems that will make a difference”: Jussi Taipale on genomics at scale and predicting gene expression

By Katrina Costa, Science Writer, Wellcome Sanger Institute

Professor Jussi Taipale joined the Wellcome Sanger Institute in January this year as a Senior Group Leader in the Generative and Synthetic Genomics Programme. His ambition? To predict gene expression directly from DNA sequence. In this blog, he reflects on his international research journey, how biology is scaling up, and why his research group aims to solve one of the biggest puzzles in genomics: how DNA sequence predicts gene activity.

Sign up for our email newsletter

The genetic code can be read like a language – we know the letters, or the nucleotide bases (A, T, C, and G), and words, which are elements influencing gene expression, such as transcription factor binding motifs and amino acid codons. What we understand less is the grammar – the regulatory rules, for example, the order and spacing of the different DNA words that can make functional regulatory elements and proteins. However, Jussi emphasises that we must understand the DNA sequence's meaning, especially in cells. The 'meaning' of gene regulatory DNA is how it controls when, where, and how much a gene is turned on. Gene expression control has driven most of Jussi's research career. Biologists can spot the key players – such as promoters that initiate gene expression and enhancers that boost it – but Jussi argues that it is like recognising words without understanding sentences. His long-term mission is to build a computational model that accounts for cellular context and can forecast gene activity. He believes this will deepen our understanding of biology, reveal disease-linked variants and enable predictable genome design.

Cracking the code: How DNA sequence predicts gene activity

One of the fundamental challenges in biology is the sequence-to-expression problem – the gap between knowing a DNA sequence and predicting gene activity in a cell. Jussi explains: “There are over 1,600 transcription factor proteins that bind to specific DNA sequences, and they all work together to regulate gene activity. That complexity is one reason this problem has been so hard to solve.”

Tackling this challenge is the core focus of Jussi’s research group at the Sanger Institute. The group aims to uncover how cellular growth is controlled during human development and compare this to what goes wrong in cancer. This will help reveal how genetic variation impacts our risk of cancer. Building on the Sanger Institute’s AI strategy, Jussi combines laboratory experiments with computational biology to predict gene expression and cancer risk, collaborating with other teams across the Institute and internationally.

Jussi believes gene activity can be predicted from sequence alone. His approach is to measure thousands of random DNA sequences simultaneously, building powerful predictive models for how transcription factor binding sites interact. He says, “We can’t test every possible DNA sequence, so we build mechanistic models – like physicists predicting the summer solstice from a model rather than running an experiment.” He calls this tackling the ‘second genetic code’, a reference to the gene regulation that works beyond the genome's protein-coding instructions. “We use completely synthetic DNA to learn the regulatory code. The computer can’t ‘cheat’ by memorising, because those sequences don’t exist in the genome.” By understanding and predicting gene expression patterns, scientists can advance precision medicine and develop new treatments for diseases such as cancer.

RELATED SANGER BLOG

AI and genome engineering: new directions in biology

Dr Leo Parts, group leader at the Sanger Institute, shares his vision for using artificial intelligence (AI) to enhance gene editing and revolutionise generative and synthetic genomics.

Jussi's research journey from Helsinki to Hinxton

To understand how Jussi reached this ambitious goal, it helps to trace his international career path. Originally from Finland, he earned his Master's in biochemistry at the University of Helsinki, where cell signalling – how cells communicate – captivated him. For his PhD, he moved to the Department of Virology to explore proteins called growth factors (GFs) that change cell activity. GF signalling pathways are frequently found to be disrupted in tumour cells. Specifically, he studied how latent transforming growth factor-β (TGF-β) is stored in the extracellular matrix - the supportive environment between cells – and can change the growth of cells when activated.1

Building on his interest in GFs, Jussi travelled to Johns Hopkins University in Baltimore, USA, to study another GF called Sonic Hedgehog and the signalling pathway it activates, which guides embryonic tissue development. Despite its video-game-inspired name, Sonic Hedgehog is a serious drug target. Jussi’s postdoctoral research explored how changes to Sonic Hedgehog can trigger cancer, such as basal cell carcinoma in the skin and medulloblastoma of the cerebellum in the brain.2 Several anti-cancer drugs are now available that target this pathway.3

This postdoctoral work strengthened his interest in studying cancer biology to decode gene regulation. In 2003, Jussi established his own research group at the University of Helsinki, which aimed to identify target genes in cell signalling that drive tumour growth across different types of cancer.

He continued this research theme when he joined the Karolinska Institute in Sweden in 2009. Here, his work focused on identifying growth inhibitors that are shared between different cancers to uncover broad drug targets. His team discovered that deleting a large regulatory region upstream of the MYC cancer gene (the “MYC super-enhancer”) made mice resistant to tumour formation, whilst normal cell growth continued.4

In the early 2000s, researchers systematically knocked out genes to decode function, but the progress rate was limited. As sequencing technologies became more powerful, Jussi switched tactics by gathering massive, unbiased datasets first, then using computational analysis to uncover patterns in the data. By 2017, he had set up at Cambridge University's Biochemistry Department, fully committed to transcriptional regulation at scale.

Jussi joined the Sanger Institute in Hinxton earlier this year for one main reason: scale. His approach of testing thousands of DNA sequences in parallel demanded infrastructure and AI capabilities that few scientific institutions can match. Scale only works with partnerships, and he was impressed by the breadth of expertise he could readily access across the Wellcome Genome Campus.

“What interests me about the Sanger Institute is the alignment: people tackling similar questions, at scale, with large datasets and strong computational tools.”

Working with Professor Ben Lehner, Head of Generative and Synthetic Genomics at the Sanger Institute, Jussi generates massive protein-DNA binding datasets to understand protein function. Together, they are investigating how protein sequence impacts the binding of transcription factors to different stretches of DNA. Alongside this, he is also working with Professor Muzz Haniffa, Head of Cellular Genomics, and Group Leader Mo Lotfollahi, integrating single-cell transcriptomics and AI to understand gene regulation in cells. Meanwhile, Dr Mathew Garnett, group leader in Translational Cancer Genomics, works with Jussi on cancer cell models that help test how transcription factor binding drives tumour growth.

Biological prediction and the experimental roadmap

Jussi's group pursues two linked goals: predicting gene expression from DNA sequence, known as sequence-to-expression, and predicting DNA binding from protein sequence, which is sequence-to-affinity.

Ultimately, his ambition is to establish a regulatory grammar for transcription, or the rules that control gene activation. This grammar could help predict gene expression to reveal deeper biological insights and help guide clinical treatment for diseases such as cancer.

New single-molecule sequencing techniques enable his team to track individual DNA-protein binding events rather than averaging across thousands, providing greater precision. Other methods profile which transcription factors bind across an entire cell's genome, providing a whole-cell regulatory snapshot.

Jussi’s group now integrates AI into their experimental workflows, modelling entire systems rather than individual elements. The next steps will be to complete a DNA–protein binding atlas and build genome-wide models to flag disease-causing variants and design therapeutic treatments.

RELATED SANGER BLOG

AI and the future of generative biology

Discover how Sanger researchers are leveraging AI tools to predict, design, and engineer biological sequences, such as DNA and proteins.

Jussi's philosophy: solve problems that matter

To tackle large projects like these, he has a guiding philosophy that his PhD advisor, Jorma Keski-Oja, and his postdoc advisors, Kari Alitalo and Philip A. Beachy, taught him: chase important problems, and do not get lost in the details.

Scientists can often follow a path to the easiest next experiment, but Jussi warns: "The risk in science is that you always see an obvious future direction. But before you know it, you're on an unpaved road somewhere at a dead end." So, he chooses to focus on impact, not just publications.

“I’m not trying to do science for scientists. I’m doing it for the world, to make a difference. I advise people to always have the bigger application further down the line in your mind as you work.”

Advice for early-career scientists

Jussi's advice to scientists at the start of their careers is: learn to code. Even experimental scientists need sufficient computational skills to design experiments that yield meaningful data and insights. When scientists use experiments and data analysis to inform each other, both can improve.

“People who can do both are very successful in our line of work because they also understand how to design experiments that they can later analyse. This is especially true for working with AI, which requires data structuring and preparation skills. You need to have computational thinking informing the experimental design so you are not producing something unusable that you cannot analyse.”

RELATED SANGER BLOG

Top ten steps to get your genomics data AI-Ready

How digital experts at the Sanger optimise data workflows to fully leverage the power of AI in the Institute’s large-scale sequencing projects

Towards a genomics Rosetta Stone

After more than two decades and four countries, Jussi has uncovered key regulatory elements and mechanisms shaping gene expression. However, predicting gene expression genome-wide requires a final leap. With the Sanger Institute’s growing strengths in AI and capacity for science at scale, his group is getting closer to building the tools needed to interpret any stretch of DNA – natural or synthetic. It is an ambitious vision, but discovering the key to unlock DNA's regulatory language, just as the original Rosetta Stone deciphered hieroglyphics, could help transform how we understand health and disease at the molecular level.