Photos credit: Zain Iqbal / Wellcome Sanger Institute

Categories: Sanger Science18 December 2024

How can we enhance biological research with AI?

By Katrina Costa, Science Writer at the Wellcome Sanger Institute

Advances in AI are rapidly changing biological research, which raises important questions surrounding its responsible use. Here we share insights from a recent conference, funded by the Wellcome Sanger Institute, on explainable AI and how it impacts biology.

Sign up for our email newsletter

What is Explainable AI?

We all know that Artificial Intelligence (AI) models are everywhere. From digital assistant devices in our homes, including Google Assistant and Alexa, to complex analysis tools in laboratories, everyone is impacted by AI – and it is here to stay.

Against this backdrop, there is a clear need for AI predictions to be trustworthy. AI models are often described as a ‘black box’ because we can see the inputs and outputs, but the inner workings remain a mystery. Instead, the world needs AI models that are transparent and ‘explainable’, so we can understand the AI systems’ decisions and assess their reliability.

Last month, the Sanger Institute hosted its first-ever “Explainable AI in Biology (#XAIB24)” conference, organised by the BioDev Network, a community dedicated to future-proofing early-career scientists with the latest computational techniques. During the event, experts from across the globe and a range of disciplines came together to discuss explainable AI and its impact on the future of biology. Here, we will explore some of the key insights shared by the speakers.

RELATED SANGER BLOG

Digital transformation and a new era in science

James McCafferty, Chief Information Officer at the Wellcome Sanger Institute, discusses the latest opportunities and challenges for research, including generative AI and enhanced data science.

AI in disease and healthcare

During the conference, the speakers shared various examples of how AI is being used across disease research and in healthcare settings. One area is the early detection of diseases including dementia and cancer, in which AI can increase the chance of successful treatment and ensure the right patients are included in clinical trials. For example, Professor Chris Sander, from the Department of Cell Biology at the Harvard Medical School, presented his lab’s research into early detection of pancreatic cancer, which is difficult to achieve with imaging. His team developed an AI model trained on nine million patient medical records to identify those at highest risk of pancreatic cancer within the next three years. The research indicated that mass screening of medical records using AI might help identify people who need further testing for this disease that often has low survival rates.

Another area of health research accelerated by AI is drug discovery. Professor Philip Kim, from the University of Toronto, discussed how AI can be used to accelerate protein design for new drug treatments. He presented several examples, including his team’s research into zinc finger proteins, which regulate gene expression. His team trained an AI model on billions of interactions between zinc finger proteins and DNA, which could then generate new zinc finger proteins that can bind to any section of DNA. These custom engineered zinc fingers offer a promising new technique for gene therapy because they are less likely to trigger an immune reaction than CRISPR editing.

AI can also be used in a clinical setting. Darlington Akogo, Founder and Chief Executive Officer at minoHealth AI Labs, presented his Moremi AI tool, which works like ChatGPT but focuses on biology and healthcare information. It can perform tasks across healthcare such as diagnostics, drug discovery, treatment planning, and molecular analysis. The tool also performs tasks it was not specifically trained for, including the generation of new molecules. Users can interact with the AI tool using plain language to answer complex biological questions. In this way, the AI acts as a general-purpose assistant, guided by the expertise of the clinician, and has built-in safety features.

AI in agriculture

Beyond health and disease, AI poses promising applications for agriculture. Alpan Raval, Chief Scientist at Wadhwani AI, outlined his recent project, the AI-powered early warning system CottonAce, which helps Indian farmers protect their crops from pests, especially cotton bollworms. In the current manual identification process, farmers use a pest trap, take photographs of the contents and send these to an agricultural university to assess. This process is slow and often ineffective. In contrast, CottonAce is a computer vision-based app used to identify pests and provide immediate guidance on whether to use pesticides. The app increases yields by 17 per cent and reduces pesticide costs by 26 per cent. It is easy to use, multilingual and works offline, so is ideal for remote locations. The project won the Google.org Impact Challenge Award and the H&M Global Change Award.

Witek collecting tardigrades from trees

panel_discussion_explainable_ai_biology_conference_2024

Panel discussion at the Explainable AI in Biology conference 2024. Image credit: Zain Iqbal / Wellcome Sanger Institute

How to have responsible AI

Whilst AI has many practical applications, the conference emphasised the need for both explainable and responsible AI. Responsible AI involves designing and developing AI systems in ways that align with social, ethical and legal values. The fundamental goal of biological research is to understand complex biological systems. Traditionally, biology has focused on generating and analysing data within a lab, but so much more can be achieved by analysing massive datasets, such as combining genomic data with healthcare information. AI models thrive on handling enormous data, but there are obvious challenges to integrating AI with sensitive data if we want a useful research resource that also maintains data privacy.

Professor Tim Hubbard, Director of ELIXIR, underscored the importance of AI operating within Trusted Research Environments, TREs, which are secure computing environments that provide approved researchers with a single point of access to data.1 A service wrapper, which is a computer programme that encapsulates code and presents a service to the user, could surround the AI to help prevent private data being shared outside the TRE. Beyond health, these large secure datasets are also vital for researching biodiversity, climate and food security.

Federated learning is another important concept for addressing data privacy concerns.2 This is a new, decentralised method for training AI. Rather than processing data in one place, AI models are trained across multiple sites without anyone seeing the encrypted data.

Another key component of responsible AI is the design approach called human-in-the-loop, in which humans work closely with the machine learning algorithm, rather than being replaced by AI.3 Human-in-the-loop AI training involves people guiding, correcting and evaluating the algorithm, to produce results potentially more accurate than AI or humans could achieve alone.

Speakers also recognised the challenge of genomic data being historically biased towards white males from European and North American backgrounds, with the underrepresentation of many demographic groups. As a result, there is a concern that building models based on these datasets will not accurately represent everyone. Increasing the diversity of the training datasets is vital to help ensure the model performs well across diverse populations. The experts also acknowledged the need for ongoing ethical discussion and regulation in the field.

THE BIODEV NETWORK

Improving digital skills with the BioDev Network

Interested in improving your digital skills and career prospects? The BioDev Network provides early career scientists with hands-on training in AI and computational biology. Find out more about the BioDev Network.

The future of AI in biology

Overall, presenters at the conference were optimistic that AI will continue to help advance biological research, especially as models become more sophisticated and better able to predict biological function. Given that AI can handle large, complex datasets, the models could help speed up the research process and automate workflows, leading to faster discoveries. As high-quality biological data become more available, the next steps for AI will involve integrating data across different scales – from gene regulation to protein interaction to neural networks. This will enable AI models to work across different biological systems.

During the panel discussions, experts also considered the rise of foundation models, which are AI systems pre-trained on huge datasets. While academic settings often lack the resources to build generalist models, there is an opportunity to explore the fine-tuning of foundation models. For instance, developing digital twins for clinical trials, where a virtual model of a patient, cell, tissue or organ is created using AI.4

The presenters agreed that AI should be viewed as a collaborative tool, providing computational insights that complement human expertise. With this in mind, it is clear that researchers need the relevant skills to work with AI to support innovation. Dr James McCafferty, Chief Information Officer at the Sanger Institute, emphasised that AI is rapidly transforming genomics, drug discovery and data-driven research, so researchers from all disciplines need to collaborate to fully realise the benefits of this technology.

Footnote:

The conference organisation was coordinated by Ronnie Crawford, Informatics and Digital Associate at the Sanger Institute. The BioDev Network is led by Dr Priyanka Surana, Informatics and Digital Solutions at the Sanger Institute.

For a simple guide to key concepts in AI, see our article: Using artificial intelligence for genomic research on the YourGenome website.