

Professor Ben Lehner joins the Sanger Institute as a new Senior Group Leader in the Human Genetics Programme.
Proteins, encoded by genes, are the building blocks of all life on Earth, yet they remain elusive in many ways. It is not yet possible to predict their activities or functions based on the DNA instructions that code for them. It isn’t possible to say how a single genetic change, or mutation, may have an effect in a cell. It is extremely difficult to design new proteins.
For Professor Ben Lehner, new Senior Group Leader at the Sanger Institute, the issue is data. With enough data, scientists can use artificial intelligence to create mathematical models that predict how a protein behaves – based on its DNA sequence alone. His goal is to generate the data, make the models, and enable the next generation of drug discovery and bioengineering.
Proteins, encoded by genes, are the building blocks of all life on Earth, yet they remain elusive in many ways. It is not yet possible to predict their activities or functions based on the DNA instructions that code for them. It isn’t possible to say how a single genetic change, or mutation, may have an effect in a cell. It is extremely difficult to design new proteins.
For Professor Ben Lehner, new Senior Group Leader at the Sanger Institute, the issue is data. With enough data, scientists can use artificial intelligence to create mathematical models that predict how a protein behaves – based on its DNA sequence alone. His goal is to generate the data, make the models, and enable the next generation of drug discovery and bioengineering.
“We are truly delighted to welcome Ben to Sanger. Ben is a hugely creative thinker and leader, who has had major impacts across a broad scientific range. We are really excited by the ambitious research plans Ben has for his group, leveraging Sanger’s core capabilities in scaling genomic technologies, and anticipate that Ben’s influence on our wider Sanger community will catalyse new and innovative collaborative initiatives.”
Professor Matt Hurles
Head of the Human Genetics programme, Wellcome Sanger Institute
Ben joins the Sanger Institute Human Genetics Programme from the Centre for Genomic Regulation (CRG), Spain. Ben has been awarded many prestigious prizes during his career so far, including the Gold Medal from the European Molecular Biology Organization (EMBO). We spoke to him about his research plans at the Institute.
What issues are you tackling in your work to map how DNA changes influence proteins?
What issues are you tackling in your work to map how DNA changes influence proteins?
I think it’s three things. It’s understanding. It’s also the ability to predict. And then it's the ability to engineer.
I think we have quite a good understanding of lots of things in biology, at a conceptual level, or at a descriptive level. But actually, we're pretty terrible at making any kind of predictions. It's still quite hard to predict what happens when you make one mutation in a famous disease protein that everyone studies.
Engineering biology is still really difficult. It's sort of embarrassingly difficult. We just can’t do it like we can engineer software or bridges or aeroplanes, where you can take parts and put them together.
Even though things have changed quite a lot in the last few years with deep learning, designing a single protein where you're tweaking what it does, or say, changing its binding ability to an antibody – it's really difficult.
Biology is a high dimensional problem. If you take a small protein, which is made of say 100 amino acids, there are 20100 ways of making that. Which is more than the number of atoms in the universe. So you can never computationally, or experimentally, explore even a tiny fraction of those combinations. You could start, but the universe would end before you finished.
And so to predict things, you need to have computational models. I think basically, these problems have been data-limited, and we haven't been generating enough data of the right type to build these models – and that is what I would like to contribute to. Sanger is a natural place to generate these data on a huge scale.
“To predict things, you need to have computational models. I think basically, these problems have been data-limited, and we haven't been generating enough data of the right type to build these models – and that is what I would like to contribute to. Sanger is a natural place to generate these data on a huge scale.”
What approach will you take?
What approach will you take?
Over the last few years, we’ve been developing assays (methods) to study different aspects of molecular biology. We've tested lots of selection assays for measuring protein stability – do mutations change the folding of proteins, for example? We’ve tested assays to measure how mutations affect a protein’s binding abilities. We've developed assays for looking at how proteins aggregate – which is really important in many neurodegenerative diseases.
The readout of these assays uses high-throughput DNA sequencing. What I'm most excited about is really what's changed in the last five years or so, enabling us to work at scale. Because DNA synthesis is cheap, and DNA sequencing is very cheap, we can do millions of experiments in parallel. And these are super, super quantitative experiments, and very, very quick and easy. These aren’t just descriptive experiments, these are experiments where we're changing something, and we're measuring what happens. I think finally, genomics is doing molecular biology.
Once we have these datasets, the world is quite good at making computational models that can predict what happens, based on defined parameters. There are fast, flexible, deep learning algorithms that have been developed by Google and Facebook, for example, which we will use alongside other methods to build generative, or predictive, models.
Measuring the effects of DNA mutations is important in human genetics. In the study of many developmental diseases, a single DNA change may be responsible for a condition. But I’m also interested in the evolution and engineering problem, which is what happens when I make two mutations in the same protein. Or three, or five. How can we predict what happens when you start combining things in lots of different combinations? That's extremely important to understand molecular engineering.
What got you interested in this area of biology?
What got you interested in this area of biology?
I've always been interested in the question of what happens when you make two genetic mutations at once. Is the outcome additive, or not? When I was a postdoctoral researcher at Sanger – which feels like a million years ago – we did this study in the worm, C. elegans; we asked what happens when we inhibit two genes at the same time.
We did roughly 60,000 combinations. It was the first time anyone had ever done it in an animal. And we found quite interesting stuff. So for example, knocking down chromatin-regulating complexes enhances the effects of lots of different mutations1. This fits with human genetics studies that have been done since, where chromatin regulators have been shown to be mutated in many developmental disorders and cancer.
The same principles are still being used now, in cancer research, 15 years later – which other genes can you inhibit to kill cancer cells with a mutated cancer gene without harming normal cells? Technology has advanced, and people are looking at human cells rather than worms, but it’s the same question.
When deep mutational scanning approaches were first starting to be developed I realised that this changed everything we could do. We could create hundreds of thousands of combinations of mutations and measure exactly what happens.
I was lucky to have core institutional funding and so switched the lab a bit, to making lots of mutations in one or two genes, where we already understood that they're working in the same system. We took lots of different bits of molecular biology – individual proteins, protein interactions and RNA splicing, and we put in lots of mutations to see what happened.
The freedom to pursue your own questions and interests at Sanger is amazing. I want to combine quantitative thinking with the ability to perform experiments at scale, and lay the foundations of programmable biology.
This information will not only allow clinicians to better diagnose and understand disease and its effects, but also enable scientists to design and produce new proteins and small molecules for disease treatment and bioengineering.
“The freedom to pursue your own questions and interests at Sanger is amazing. I want to combine quantitative thinking with the ability to perform experiments at scale, and lay the foundations of programmable biology.”
“We are truly delighted to welcome Ben to Sanger. Ben is a hugely creative thinker and leader, who has had major impacts across a broad scientific range. We are really excited by the ambitious research plans Ben has for his group, leveraging Sanger’s core capabilities in scaling genomic technologies, and anticipate that Ben’s influence on our wider Sanger community will catalyse new and innovative collaborative initiatives.”
Professor Matt Hurles
Head of the Human Genetics programme, Wellcome Sanger Institute
Ben joins the Sanger Institute Human Genetics Programme from the Centre for Genomic Regulation (CRG), Spain. Ben has been awarded many prestigious prizes during his career so far, including the Gold Medal from the European Molecular Biology Organization (EMBO). We spoke to him about his research plans at the Institute.
What issues are you tackling in your work to map how DNA changes influence proteins?
What issues are you tackling in your work to map how DNA changes influence proteins?
I think it’s three things. It’s understanding. It’s also the ability to predict. And then it's the ability to engineer.
I think we have quite a good understanding of lots of things in biology, at a conceptual level, or at a descriptive level. But actually, we're pretty terrible at making any kind of predictions. It's still quite hard to predict what happens when you make one mutation in a famous disease protein that everyone studies.
Engineering biology is still really difficult. It's sort of embarrassingly difficult. We just can’t do it like we can engineer software or bridges or aeroplanes, where you can take parts and put them together.
Even though things have changed quite a lot in the last few years with deep learning, designing a single protein where you're tweaking what it does, or say, changing its binding ability to an antibody – it's really difficult.
Biology is a high dimensional problem. If you take a small protein, which is made of say 100 amino acids, there are 20100 ways of making that. Which is more than the number of atoms in the universe. So you can never computationally, or experimentally, explore even a tiny fraction of those combinations. You could start, but the universe would end before you finished.
And so to predict things, you need to have computational models. I think basically, these problems have been data-limited, and we haven't been generating enough data of the right type to build these models – and that is what I would like to contribute to. Sanger is a natural place to generate these data on a huge scale.
““These problems have been data-limited, and we haven't been generating enough data of the right type to build these models... Sanger is a natural place to generate these data on a huge scale.”
Professor Ben Lehner
Welllcome Sanger Institute
What approach will you take?
What approach will you take?
Over the last few years, we’ve been developing assays (methods) to study different aspects of molecular biology. We've tested lots of selection assays for measuring protein stability – do mutations change the folding of proteins, for example? We’ve tested assays to measure how mutations affect a protein’s binding abilities. We've developed assays for looking at how proteins aggregate – which is really important in many neurodegenerative diseases.
The readout of these assays uses high-throughput DNA sequencing. What I'm most excited about is really what's changed in the last five years or so, enabling us to work at scale. Because DNA synthesis is cheap, and DNA sequencing is very cheap, we can do millions of experiments in parallel. And these are super, super quantitative experiments, and very, very quick and easy. These aren’t just descriptive experiments, these are experiments where we're changing something, and we're measuring what happens. I think finally, genomics is doing molecular biology.
Once we have these datasets, the world is quite good at making computational models that can predict what happens, based on defined parameters. There are fast, flexible, deep learning algorithms that have been developed by Google and Facebook, for example, which we will use alongside other methods to build generative, or predictive, models.
Measuring the effects of DNA mutations is important in human genetics. In the study of many developmental diseases, a single DNA change may be responsible for a condition. But I’m also interested in the evolution and engineering problem, which is what happens when I make two mutations in the same protein. Or three, or five. How can we predict what happens when you start combining things in lots of different combinations? That's extremely important to understand molecular engineering.
What got you interested in this area of biology?
What got you interested in this area of biology?
I've always been interested in the question of what happens when you make two genetic mutations at once. Is the outcome additive, or not? When I was a postdoctoral researcher at Sanger – which feels like a million years ago – we did this study in the worm, C. elegans; we asked what happens when we inhibit two genes at the same time.
We did roughly 60,000 combinations. It was the first time anyone had ever done it in an animal. And we found quite interesting stuff. So for example, knocking down chromatin-regulating complexes enhances the effects of lots of different mutations1. This fits with human genetics studies that have been done since, where chromatin regulators have been shown to be mutated in many developmental disorders and cancer.
The same principles are still being used now, in cancer research, 15 years later – which other genes can you inhibit to kill cancer cells with a mutated cancer gene without harming normal cells? Technology has advanced, and people are looking at human cells rather than worms, but it’s the same question.
When deep mutational scanning approaches were first starting to be developed I realised that this changed everything we could do. We could create hundreds of thousands of combinations of mutations and measure exactly what happens.
I was lucky to have core institutional funding and so switched the lab a bit, to making lots of mutations in one or two genes, where we already understood that they're working in the same system. We took lots of different bits of molecular biology – individual proteins, protein interactions and RNA splicing, and we put in lots of mutations to see what happened.
The freedom to pursue your own questions and interests at Sanger is amazing. I want to combine quantitative thinking with the ability to perform experiments at scale, and lay the foundations of programmable biology.
This information will not only allow clinicians to better diagnose and understand disease and its effects, but also enable scientists to design and produce new proteins and small molecules for disease treatment and bioengineering.
“The freedom to pursue your own questions and interests at Sanger is amazing.”
Professor Ben Lehner
Wellcome Sanger Institute