Explore our funded projects

Below is a list of project titles and summaries of the initiatives funded through the AIchemy Pump Priming Call 2024. These projects explore a range of ideas and approaches, supporting research and development in the field.

For more information on the scoring of the Pump Priming Applications, click here.

Accelerating Materials Discovery: Integrating Machine Learned Force Fields (MLFF) with Monte Carlo Simulations

Jay Zhou – University of Bath

Recently, the use of Machine Learned Force Fields (MLFFs) for Molecular Dynamics (MD) simulations have popularized. However, this is not the case for Monte Carlo (MC), which performs exceptionally well for systems such as gases, complex mixtures and adsorptions. In fact, the machinery for MLFF integrated MC is currently absent on open-sourced platforms. Therefore, the goal of this project is to develop an open-sourced software framework that integrates MLFFs with MC simulations to enhance the calculation of various thermodynamic properties of solid-state materials. By leveraging MLFFs trained on ab-initio data, we aim to improve the accuracy of MC simulations in predicting precise free energy properties, while offering significantly faster compute speed than ab-initio methods.

This approach addresses the limitations of classical force fields (CFFs), which are traditionally used to describe the interactions between atomic and molecular species in MC simulations but often fall short in accuracy. With this project, we commit to program a reliable interface between the Monte Carlo simulation software package DL_MONTE (a member of the Daresbury Lab software suite) and MLFFs in the form of universal Python functions (callable by ASE). We will also engage in extensive testing of the said interface, by using the CHGNet Neural Network Force Field to calculate thermodynamic properties of water via Grand Canonical Monte Carlo (GCMC). The final code package will be released open-sourced alongside detailed documentation and tutorials to attract community engagement and fast track the adoption of AI in the MC simulation community.

AI-Enabled Prediction of Lipid Membrane Composition from Optical Signatures

Dr. Miguel Paez Perez – Imperial College London

Lipid membranes play a key role in biology; they provide structural integrity, are central in intercellular communication, control content exchange, and transduce extracellular signals. Their functionality arises from their unique structure and biophysical properties, which are dictated by the interaction between the membrane’s constituent lipid molecules. Deregulation of these lipid-lipid interactions has been linked to diseases including cancer, malaria, Alzheimer, or atherosclerosis. From a commercial perspective, membrane composition and lipid-lipid interactions influence the efficacy of lipid-based drug carriers, such as miRNA vaccines. Yet, there is a limited understanding on how the lipid composition of complex, multi-component membranes, affects their biophysical behaviour.

To address this challenge, we are developing tools capable of monitoring the biophysical features of lipid bilayers with high throughput and low-cost. We will leverage on an in-house high-throughput vesicle production device, optical readouts, and AI tools to unlock an otherwise inaccessible insight into how the chemistry of complex lipid membranes dictates its biophysical properties.

This project will generate a publicly available, curated dataset to facilitate the development of data-driven biophysical models, and we anticipate its outputs will find applications in areas including antimicrobial resistance research or artificial cell development, supporting key UK strategic areas like Engineering biology.

AI-Enhanced Molecular Dynamics: Integrating Long-Range Interactions with Graph Neural Networks

Dr. Devis Di Tommaso – Queen Mary University of London

Molecular Dynamics (MD) is an essential computational tool for studying atomistic-level phenomena. However, methods like ab initio MD, which rely on density functional theory (DFT) to compute energies and forces, are computationally expensive. On the other hand, methods based on classical interatomic potentials (IP) offer speed but lack flexibility. In recent years, machine learning (ML) approaches promise DFT-quality simulations at faster speeds but often neglect long-range interactions, which are crucial for accurately describing systems such as liquids, gas-solid, liquid-solid, and biomolecular systems.

This project aims to develop AI-enhanced MD methods that integrate long-range interactions for predicting both energies and forces. The outcome will be the first version of a code implementing MGT to partially or fully replace the costly and time-intensive traditional computational methods used at each timestep of MD simulations. The model will compute atomic forces, energies, and changes in atomic positions throughout the simulation, enabling a more efficient and scalable approach to studying molecular systems. This will facilitate accelerated atomistic MD simulations that account for both local and long-range interactions.

Bayesian Optimization for Accelerating Metal-Based Antibiotic Discovery

Dr. David Husbands – University of York

Antimicrobial resistance to current treatments poses a growing threat to global healthcare. At the same time the antibiotic development pipeline remains perilously stagnant. This project aims to accelerate the discovery of novel metal-based antibiotics by integrating machine learning (ML) with high-throughput chemical synthesis and biological evaluation. Collaborating with Atinary Technologies, we will leverage Bayesian Optimization to train ML models with our chemical libraries to predict and iteratively refine iridium(III) metalloantibiotics, maximizing antibacterial potency while minimizing toxicity.

Metal-based antibiotics offer unique structural and functional advantages over organic compounds, yet their discovery remains slow, partially due to the vast chemical space available. Traditional methods of structure-activity relationship elucidation are time-intensive and inefficient. By training a ML model on 1440 iridium(III) complexes, we will virtually screen ~400 million potential compounds from combinations of building blocks, dramatically enhancing the speed of hit identification and the hit-rate. From this virtual screen, two iridium(III) libraries will be synthesized and evaluated using an automated Opentrons system.

Overall, this project is anticipated to yield the following outcomes:

• Identification of 10 lead iridium(III) complexes with high antibacterial activity and low cytotoxicity for further biological evaluation.

• Development of an ML-driven approach to optimize metal-based drug discovery.

• Strengthening of industry-academic collaboration with Atinary Technologies.

• Publication of research findings to advance AI-driven drug discovery.

This project leverages the use of AI for chemical innovation, setting the stage for future applications in metallodrug development for other ailments.

Computer Vision for Predicting the Impact of Additives in Protein Crystallisation

Professor Bao Nguyen – University of Leeds

The rise of new drug discovery modalities has underscored the need for efficient macromolecular crystallization—both as a key characterization technique and as a greener alternative to traditional purification methods in manufacturing. However, the weak intermolecular interactions in protein crystals often lead to instability, necessitating the use of additives that influence protein binding, ionic strength, or nucleation. While standardized crystallization screens are widely used, the underlying intermolecular interactions and nucleation/growth mechanisms remain poorly understood, with only limited systematic studies on a small subset of proteins

This project aims to address these challenges using AI-driven computer vision to classify and extract morphological data from microscopic images of protein crystallization screens, sourced from the VMXi beamline at Diamond Light Source. The outcome will be an automated workflow for crystal characterization across diverse sources, a robust dataset of microscopic images—including negative results—and AI models trained to predict optimal crystallization conditions and additives. By improving crystallization success rates, this approach advances both protein characterization and scalable purification. Furthermore, the same AI-assisted image processing techniques can be extended to other materials, broadening their impact beyond biomolecular systems.

Enabling Data-Driven Discovery and Reaction Optimisation in Porous Organic Cage Synthesis

Dr. Benjamin D Egleston – Imperial College London

Porous Organic Cages (POCs) are a class of molecular materials with tunable micropore structures that offer significant potential in separation technologies. Recently, our lab has been implementing machine learning tools to assess the accessibility of POCs by encoding chemists’ intuition (doi.org/10.1021/acs.jcim.1c00375). However, traditional methods for synthesising and analysing these materials reaction mixtures are limited due to the complexity of self-assembly and the unintuitive nature of species formed. To address this challenge, the project will integrate robotic liquid handling and parallel synthesis with automated data processing and analysis to enable generation of large experimental datasets – providing structural information and thermodynamic data for these complex systems for machine learning or data-driven applications.

Building on recent progress in automated high-throughput screening for combinatorial synthesis of metal-organic cages (doi.org/10.26434/chemrxiv-2024-hl427-v4) and POCs (doi.org/10.1039/D3SC06133G), the project extends these methodologies to even more complex systems. The project will be centred around identifying unintuitive structures and intermediates in reaction mixtures using generated large libraries of potential molecules for identifying in mass spectrometry (MS) data. This will be combined with automated kinetic sampling and analysis of MS data in parallel reactions to enable mapping of entire reaction spaces.

One key goal of this project is to demonstrate that automation of the discovery process, from reaction preparation to data interpretation, can accelerate the identification of novel materials. Generation of much greater volumes of detailed data will allowing for a deeper understanding of these complex systems. The resulting data-driven foundation will accelerate discovery of novel POCs and other structures that are challenging to predict using traditional intuition.

Exploration of defect superstructure phase diagrams in graphene with Bayesian

Dr. Lukas Hoermann – University of Warwick

The atom-scale design of two-dimensional materials, particularly defective graphene, shows great promise for catalysis, sensing, and energy storage. By integrating experimental growth and analysis with Bayesian-AI-enabled configuration space prediction via the SAMPLE code, we will lay the groundwork for the experimental design to efficiently explore the phase diagram of defective graphene. This project will uncover how experimental parameters—temperature and gas flux—influence the formation of defect superstructures in graphene that govern its electronic and mechanical properties.

Using SAMPLE, we will generate a comprehensive phase space of hundreds of millions of defect superstructures and efficiently predict their formation energies. This dataset will be available on the NOMAD database. We will calibrate the theoretical phase diagram using TEM and AFM images of N-defects in graphene from our collaborators David Duncan, Alexander Saywell (University of Nottingham), and Christopher Allen (Diamond Light Source, University of Oxford). By mapping the experimental structures with a tessellation code (Duncan and Saywell) and computing their formation energies with SAMPLE, we will place these structures within the phase diagram. Using Bayesian-AI, we will learn the functional dependence of the N-concentration and defect composition on the deposition temperature and gas flux during sample preparation. This will enable the prediction of defect patterns at a given deposition temperature and guide future experiments to achieve graphene layers with targeted defect superstructures. The developed approach will be broadly applicable to any defective two-dimensional material or surface, offering a versatile framework for precision surface engineering in a range of applications.

High-Throughput Data-Driven Electrolyte Design to Enable Lithium Metal Batteries

Dr. Neubi Xavier – University of Surrey

Rechargeable batteries are a major part of our everyday lives and improving them further is crucial for future technology. The gold standard for the high-performance next-generation batteries is the use of lithium metal as the anode material. One of the major bottlenecks to enabling lithium metal batteries is the increased reactivity between current electrolyte formulations and lithium, leading to uncontrollable side reactions during operation and ultimately causing battery failures.

Researchers are currently focusing efforts on engineering new electrolyte formulations, leading to hundreds of scientific papers being published each week. The amount of data generated makes it impossible for a single researcher to follow all available literature and hinders the rational design of new electrolytes.

In this project, Dr. Neubi Xavier and Dr. Matthias Golomb aim to collate this vast amount of data into an accessible database that will establish clear reporting standards and serve battery scientists, computational chemists, and AI researchers as a starting point for further experimental and computational investigations. Using large language models, they aim to extract property information on lithium metal electrolytes from a wide range of available scientific literature and identify common core descriptors for high-performing candidates. In addition, they will combine high-throughput atomistic simulations and machine learning to fill gaps in the resulting database, aiming to create the most complete and standardized picture of the lithium metal-compatible electrolyte research landscape to date.

Transforming Chemistry Labs with Safe and Intuitive Human-in-the-Loop Robotic Systems

Dr. Luis Figueredo – University of Nottingham

This project aims to transform chemistry labs through robotics and AI—overcoming adoption barriers while enhancing safety and efficiency. We’ll develop a framework that empowers chemists to intuitively teach robots experimental tasks via multimodal demonstrations, eliminating the need for programming expertise while ensuring stringent safety for seamless human-in-the-loop (HIL) operation. Our approach leverages generative AI for semantic scene understanding, grounded in model-based representations to enhance explainability and safety. This enables robots to interpret dynamic lab environments and manipulate glassware and hazardous substances. A certified safety layer ensures compliance with strict standards, advancing HIL automation in chemistry and aligning with AIchemy’s mission to foster intuitive, high-trust robotics in scientific research.

The automation of chemistry labs remains challenging due, among other reasons, to the requirements for precise and safe manipulation of hazardous substances, diverse glassware, and evolving experimental setups. Traditional robotic solutions require extensive programming expertise, limiting accessibility. Our approach leverages multimodal human demonstrations—combining kinaesthetic, visual, and haptic inputs—to develop constraint-based robotic behaviours that chemists can intuitively guide. Certified safety layers ensure secure robotic handling of hazardous liquids, enabling seamless human-robot collaboration in high-stakes lab environments.

Intended Outcomes

A human-in-the-loop robotic framework for intuitive chemistry task learning.
A certified safety architecture ensuring reliable robotic execution in human-centred labs.
A real-world lab demonstration showcasing precise liquid handling such as controlled pouring and liquid-liquid extractions.
Open-source tools and methodologies to foster adoption and collaboration.

This project lays the foundation for future fully automated chemistry labs, accelerating discovery while maintaining human oversight and safety.

X-GAMES: Crystallography with Machine Learning

Professor Craig Butts – University of Bristol

We will build a proof-of-principle generative AI tool – X-GAMES – to identify chemical structures directly from powdered samples by combining NMR spectroscopic and X-ray diffraction data. This is of significant value to pharmaceutical industry, where the chemical structure of molecules, and their packing in crystals controls their drug properties.

Existing generative AI methods are very good at creating a myriad of images or text on a generalised subject and can also be taught to create molecules that fit broad characteristics, e.g. “make me a molecule that might be drug-like”. However, generative structure determination is a much harder challenge – as it requires generating the one-and-only chemical structure that fits uniquely to a particular set of spectroscopic data. At Bristol we have developed early prototype systems capable of doing this albeit only for molecules with a few atoms – based on ‘inverting’ a neural network version of our existing IMPRESSION machine learning architecture that was designed to predict solution state NMR parameters. Essentially this prototype predicts structures from spectra, rather than spectra from structures.

The goal of this X-GAMES project is to build out this prototype so that it works for larger, more complex drug-like molecules. To achieve this, we will train X-GAMES on 10-100x larger datasets which integrate Xray diffraction data as well as the NMR data that our prototype is already designed to use.