Image credit: Mark Thomson / Wellcome Sanger Institute

Categories: Sanger Science17 October 2024

AI and the future of generative biology

By Katrina Costa, Science Writer, Wellcome Sanger Institute

Artificial intelligence (AI) is transforming biology by enabling researchers to build predictive models from vast biological datasets. Wellcome Sanger Institute researchers are leveraging AI tools to predict, design, and engineer biological sequences, such as DNA and proteins.

Sign up for our email newsletter

Why is AI useful for genomics?

Artificial Intelligence (AI) is valuable in genomics because it enables researchers to analyse vast amounts of complex genomic data more efficiently and accurately than before. For example, each human genome contains around 3 billion base pairs1 and large-scale studies can involve hundreds of thousands of genomes. AI can also help identify patterns and correlations in data that are too subtle or complex for us to detect, and predict the impact of specific changes. Of course, this is caveated with the risk of false positives (identifying patterns that are not correct), which can require lots of data to gain confidence in the findings.

Generative AI could transform the field of genomics by offering innovative tools and approaches to understanding complex biological data. It will become increasingly important as genomic data grows in volume and complexity, and AI tools become more powerful. This has led to the new scientific field of generative genomics.

The cost of a single human genome has fallen dramatically since the first was completed in 2003. That original assembly cost around $3 billion. The arrival of next-generation sequencers in 2007 saw the cost fall from millions of dollars to thousands, with the $1,000 milestone reached in the late 2010s. By 2022, the Sanger Institute could sequence a human genome for around $500.

RELATED BLOG POST

Hacking AI for genomics

From chatbots to machine learning, genome scientists explore how AI can benefit biological research

Generative genomics – a new era for biology

Generative genomics is a growing field that combines advanced computational techniques, especially generative AI (gen AI), with genomic research. Scientists can use AI models to understand and analyse genetic sequences or predict sequence properties and design new sequences. This moves biology from a descriptive to a more predictive and ultimately engineering realm, advancing the domains of medicine, biotechnology, synthetic biology and beyond.

Scientists in this field train generative AI models on large genetic sequence datasets of DNA or RNA from various organisms. The models use these genetic sequences to detect patterns, structures, and functional components.

Once trained, generative AI (gen AI) models can create new genetic sequences with specific properties. For example, the models could optimise gene sequences for enhanced gene expression or regulation and design new proteins for therapeutic applications.

The Sanger Institute recently launched the world’s first Generative and Synthetic Genomics research programme. This team aims to build foundational datasets and models to engineer biology, in a similar way to how we engineer electronics. This would increase our understanding of genomes and could lead to personalised medical treatments.

Gen AI can enhance the field of genomics in several important ways. For example, whilst scientists have a solid understanding of the components behind biological systems, as well as an understanding of how they cause disease when they fail, it is not easy to predict how these systems will respond to change – even simple changes such as switching one letter in the DNA sequence of a disease-associated gene.

This knowledge gap can be tackled using gen AI models, which could help scientists predict the impact of genetic changes within an individual, and within and between populations. This could help identify which mutations are involved in certain diseases. The Sanger Institute specialises in large-scale genomic data generation, so it is ideally positioned to create the foundational datasets.

In protein biology, researchers can use AI to design proteins to develop new drugs, enzymes and biomaterials (for example, see the tool AlphaFold). Beyond generative and synthetic genomics research, AI could benefit single-cell genomics, which generates vast quantities of RNA sequence data. Here, gen AI could reveal what steps can create or eliminate the differences between cells. This could help scientists understand how cells function and how they change in response to the environment.

Biological research is becoming more ‘multi-omic’, integrating diverse data types such as genomic, transcriptomic and epigenomic data. AI could increase our understanding of how these data are connected and reveal previously hidden patterns.

AI wins two 2024 Nobel Prizes

Professors Geoffrey Hinton (known as the “Godfather of AI”) and John Hopfield have been awarded the Nobel Prize in Physics for their work on machine learning.

Professors Demis Hassabis (co-founder of Google DeepMind) and John Jumper have won the Nobel Prize in Chemistry for creating AlphaFold2, an AI tool that can predict the structures of almost all proteins.

“We believe that scientific capability for addressing most biological problems is currently hampered by limitations in data. Our goal is to produce data at scale in a fast and cost-effective way, which can then be used to train predictive and generative models. By producing large-scale genetic sequences and predicting the impact of genetic changes, gen AI tools can help accelerate our understanding of genome biology. As we draw on the power of AI, we aim to engineer biology with precision, and eventually reimagine the future of genomic research and its eventual impact on areas such as healthcare, agriculture and biotechnology."

Professor Ben Lehner,
Head of Generative and Synthetic Genomics, Wellcome Sanger Institute

Synthetic genomics

Generative AI is also set to transform synthetic genomics – a field of biotechnology where scientists design and create new artificial genetic sequences. Traditionally researchers used genetic engineering techniques, changing specific genes inside an organism. In contrast, with synthetic genomics scientists can create entire new genomes or genomic segments by chemically synthesising and then assembling DNA or RNA sequences. Synthetic genomics can also enhance gene and genome editing.

At the Sanger Institute, Leopold Parts leads a research group that aims to develop new technologies to write and edit genomes at scale and speed to biological design. The team’s goal is to understand the impact of DNA mutations by engineering changes to cellular DNA. In the laboratory, they use genome engineering and synthetic biology to produce the data for training AI models. At their computers, they develop tools to analyse the data and propose new DNA designs.

Jussi Taipale will be joining the Generative and Synthetic Genomics programme in early 2025. Find out more about his group on his research profile.

These new capabilities for engineering biology will come with important responsibilities to consider and explore the ethical, legal and social implications. The Institute’s Policy Team carried out initial research with international stakeholders, to consider the ethical implications of creating synthetic genomes. Sanger Institute researchers working with the Policy Team are building on this work to proactively consider the implications of this new programme of work and develop processes for responsible governance and wider engagement.

RELATED BLOG POST

Predicting cell responses to disease and drugs with AI

We spoke to Mo Lotfollahi, a new Group Leader in the Cellular Genetics programme who is developing generative artificial intelligence (AI) models for predicting cellular responses

Mo Lotfollahi, group leader at the Wellcome Sanger Institute uses AI to predict cells' responses to disease and drugs

What’s next for AI in genomics?

Artificial Intelligence (AI) – especially generative AI – is rapidly advancing into our everyday lives, and will play a crucial role in the future of genomic research. As genomic datasets continue to grow in volume and complexity, AI will become essential for efficiently analysing and understanding these research outputs.

Generative AI could enhance much of the scientific understanding of genomics, including genetic variation, how mutations affect DNA function, and even how to create new tailored genetic sequences and cells. This may bring us closer to personalised medicine. Scientists will apply AI across multi-omics to gain a more thorough understanding of biological processes.

However, as AI tools continue to push technological boundaries, it is essential that research organisations such as the Sanger Institute embed ethical thinking into the design and delivery of their research, and that governing bodies invest in addressing the social and ethical implications. Responsible and explainable AI is essential for securing public trust and maximising the benefits these tools bring. The Sanger Institute is well-positioned to be a leader in this field, with its large-scale data generation and investment in AI-supported research and the ethics surrounding it.

Footnote:

For a simple guide to key concepts in AI, see our article: Using artificial intelligence for genomic research on the YourGenome website.