Explore our Funded Projects

Browse through the projects funded by AIchemy, highlighting cutting-edge research, training, and collaboration at the intersection of AI and chemistry.

This page will be regularly updated with project outcomes, results, and new developments, so please check back often to stay informed about the latest progress from these projects.

Student Internships
Pump Priming Funding Call
Collaborative Travel
Frontier Fund

2024
2025
2026

Accelerating Chemistry Lab Automation Through AI-Driven Robotics

Alex Wright
Host University: University of Liverpool
Host Academic: Dr. Gabriella Pizzuto

Alex’s project focused on investigating the simulation-to-reality gap in particle behaviour for robotic chemistry, using the ORBIT open-source robotics simulator. Alex focused on developing a method to represent this gap in simulated particles to improve their usability for robotic chemists. The project involved deploying a simulated robotic environment to measure the angle of repose and comparing these results to real-world experiments. Ultimately, the goal was to refine simulation parameters to better align virtual and physical systems.

The project resulted in the development of a low-cost powder flow testing method to analyse discrepancies between simulated and real particle behaviour. Key findings revealed significant challenges in accurately replicating particle collisions, as small particles passed through simulation boundaries, skewing comparisons. Additionally, simulated salt particles differed markedly from their real counterparts in appearance and flow, producing an unrealistically low angle of repose. These findings highlight the need for further refinement of environment parameters or alternative particle models. With more time and data, the workflow could be improved to reduce these discrepancies and enhance the simulator’s realism.

Through this project, Alex strengthened his ability to work within a collaborative coding environment, building upon existing particle physics frameworks and integrating new methods. He gained foundational experience with ROS, learning how simulated environments can be transferred to physical robotic setups. He also improved his communication and teamwork skills, as he frequently collaborated with colleagues to troubleshoot and refine code. Additionally, Alex improved his technical abilities in 3D CAD modelling and scientific presentation, broadening his practical and professional skill set beyond coding alone.

Bayesian Optimization for Chemical Applications

Eltjo Mante
Host University: Imperial College London
Host Academic: Prof. Kim Jelfs

This project aimed to enhance a Bayesian Optimization web application (web-BO) and explore multi-fidelity Bayesian Optimization (MFBO) for chemical problems. The first goal was to improve the usability and functionality of web-BO through bug fixes and new features. The second goal involved investigating MFBO methods in the context of chemical simulations, particularly focusing on the 6D Hartmann function.

Key outcomes included the development of a video tutorial page to replace a looping GIF, enabling users to pause and navigate freely. An explanations page was added to clarify tool functionality and input formats, complete with sidebar navigation and tooltips. Several bugs were resolved, including overlapping input cells and trailing decimals, and graph displays were cleaned up for clarity.

On the MFBO side, the project compared low- and high-fidelity optimization methods, revealing that low-fidelity approaches often reached the global minimum more effectively than high-fidelity ones, which tended to get stuck in local minima. A major challenge was modeling correlation between fidelities, which proved to be highly problem-specific and complex. Future work could focus on integrating MFBO into web-BO, with correlation modeling as a key step.

Eltjo gained hands-on experience in machine learning, full-stack web development (Flask, HTML, CSS, Bootstrap, JavaScript), and version control using Git and GitHub. The collaborative environment also fostered growth in communication and teamwork, balancing independent problem-solving with guidance from experienced peers.

Data Science Approach to Crystal Structuress?

Ziqiu Jiang
Host University: University of Liverpool
Host Academic: Prof. Vitaliy Kurlin

This project aimed to utilize the average-minimal-distance (AMD) metric to assess the similarity between novel predicted crystal structures and experimentally determined structures stored in the Cambridge Structural Database (CSD). This comparison is essential for validating computational predictions of protein structures by providing a quantitative measure of how closely they align with known experimental data.

Ziqui successfully computed the nearest neighbours of predicted crystal structures by comparing them with over 850,000 experimental crystal structures from the Cambridge Structural Database (CSD). This was achieved using the Average Deviation from Asymptotic (ADA) metric, an invariant that accounts for both geometric considerations (through AMD) and the density of the atomic points in crystals. It was observed that most predicted crystal structures with high density tend to closely match experimental structures. However, lower-density predicted crystals often displayed more significant gaps when compared to experimental structures. These cases have the potential to inform further research in crystal structure prediction and validation.

Ziqui gained a deeper understanding of mathematical invariant and their applications in computational chemistry. He also learned key concepts in crystallography, particularly the structural properties of crystals and definition of crystal components, and how these concepts can be integrated into computational models. By applying algorithms to compare predicted crystal structures with experimental data, he developed practical skills in analysing crystal structures and identifying structural similarities, further reinforcing my computational and analytical expertise.

Learning from large-scale CSP – database-informed prediction of spontaneous resolution

Leo Arogundade
Host University: University of Southampton
Host Academic: Prof. Graeme Day

This project aimed to investigate classifier models for predicting chiral resolution by crystallisation of small organic molecules. The models were trained on computational data comparing the relative stabilities of racemic and enantiopure crystal structures. As part of the workflow, Leo used the Cambridge Structural Database API to extract molecular files, applied RDKit to calculate molecular descriptors, and trained logistic regression and support vector machine models to predict crystallisation outcomes. The project contributes to understanding how molecular features influence chiral resolution and explores the potential of machine learning in crystal engineering.

Leo applied several classification models to predict chiral resolution based on energy differences between racemic and enantiopure crystal structures from a dataset of 298 molecules. Initial models (SVM, logistic regression, decision tree) achieved over 90% accuracy but were biased toward predicting racemic crystallisation due to class imbalance. A second round of modelling with balanced data reduced bias but also lowered accuracy, which was later improved through feature selection and de-correlation, resulting in final accuracies of 83% (SVM), 62% (logistic regression), and 73% (decision tree). Toward the end of the project, Leo began planning how to expand the dataset by proposing new molecules for future study in the forerunner CSP project. These outcomes highlight both the potential and limitations of machine learning in predicting crystallisation behaviour.

Through this project, Leo gained practical experience in literature analysis, molecular data extraction using RDKit and MOL files, and using the Cambridge Structural Database API to retrieve crystal information via refcodes. He developed a strong understanding of machine learning algorithms—support vector machines, logistic regression, and decision trees—and learned how to implement them in Python. Additionally, he improved his presentation skills, and learned to set up computing environments on a supercomputer (Iridis5), broadening his technical capabilities in high-performance computing.

Machine Learning Tight Binding for Proton Battery

Xuheng Zhao
Host University: Imperial College London
Host Academic: Dr. Jarvist Frost

Xuheng’s project aimed to optimise Density Functional Tight Binding (DFTB) parameters from a standard database specifically for targeted materials, using machine learning techniques with the assistance of the Tight Binding Machine Learning Toolkit (TBMaLT). These refined parameters are intended for use in molecular dynamics simulations to predict material properties, contributing to the development of a proton battery. The project focused on building and validating a workflow using small molecules to demonstrate its feasibility and accuracy. Ultimately, the goal is to establish a robust, transferable simulation pipeline for future energy-related materials research.

A machine learning workflow using TBMaLT was successfully built to optimise DFTB parameters, involving dataset preparation, input formatting, model training with PyTorch, and performance evaluation against DFT ground truth. However, due to limitations in TBMaLT’s current development stage, the predicted properties lacked sufficient accuracy, and final parameter extraction for molecular dynamics was not completed. In parallel, Zuheng developed a non-ML optimisation workflow using the Nelder-Mead algorithm, which directly refines DFTB parameters via DFTB+ calculations and MSE loss minimisation. While promising, this workflow requires further refinement to ensure physically meaningful Hamiltonian matrices. Together, these efforts lay the groundwork for a robust simulation pipeline to support proton battery material design.

Through this project, Xuheng learned the fundamentals of tight-binding methods and how they integrate with AI in scientific research, enhancing my understanding of AI’s role in advancing science. Additionally, she gained knowledge in compiling and using software such as Gaussian, FHI-aims, and DFTB+, understanding their suitability and limitations for various purposes. She also acquired a basic understanding of the development of science-purpose packages, like TBMaLT.

Machine Learning-Assisted High-Throughput Screening for Organic Semiconductors: A Comprehensive Study and Database Development

Malin Zollner
Host University: University of Strathclyde
Host Academic: Dr. Tahereh Nematiaram, Dr. Yashar Moshfeghi

Malin’s project aimed to explore the research landscape of organic semiconductors, focusing on their applications in sensors, photovoltaics, and light-emitting diodes. A key objective was to establish a literature-based database containing experimental data such as power conversion efficiency and structural/electronic fingerprints of donor and acceptor materials. Malin also planned to compute structural fingerprints and replicate machine learning methodologies from previous studies to trial high-throughput screening techniques. The ultimate goal was to support materials selection and optimisation using machine learning for organic photovoltaic applications.

She conducted a thorough literature review on organic semiconductors, identifying key descriptors, fingerprints, and machine learning algorithms used in sensor, photovoltaic, and LED applications. She developed code to run machine learning models and generate a database of molecular fingerprints and descriptors, which is now available for future research. A detailed database of organic donor and acceptor molecules was compiled, enhanced with Morgan, Daylight, and MACCS fingerprints. This resource supports ongoing efforts in machine learning-assisted materials screening for organic photovoltaics.

Throughout the project, Malin gained a solid understanding of the fundamentals of organic semiconductors and how artificial intelligence is being applied in this field through literature review and research. Regular meetings with her supervisors helped her improve presentation skills and deepen her understanding of the research process. Under expert guidance, she developed and refined her coding skills, which were especially valuable given her limited prior experience. She also learned how to run machine learning algorithms and became familiar with key AI terminology, marking her first steps into computational research. Overall, the project provided a strong foundation for future work in AI-driven materials science.

Mapping the chemical space of intermetallic compounds

Ryan Napo Nduma
Host University: Imperial College London
Host Academic: Prof. Aron Walsh and Anthony Onwuli

The aim of the project was to learn and understand the different chemical properties that distinguish intermetallics from other possible compounds within the chemical space. Then utilize machine learning and relevant computational techniques to contribute to the Walsh Materials Design group’s SMACT software by building a series of filters and rules. These filters and rules would allow SMACT to appropriately screen for these compounds within the chemical space.

Ryan developed two key rules: the Valence Electron Count and the Electronegativity Difference to help distinguish intermetallic compounds, based on insights from literature and guidance from Prof. Walsh and Anthony. He applied these features in a classification exercise using random forest and logistic regression models, exploring how feature weighting affects model performance. His contributions included writing code that was integrated into the latest version of SMACT, enabling new functionality such as Valence Electron Count calculations. Additionally, he helped improve SMACT’s usability by fixing examples and tutorials. These outcomes align with the project’s goal of mapping the chemical space of intermetallics and enhancing tools for materials discovery.

Through this project, Ryan gained a deeper understanding of materials informatics, including its potential to drive innovation in sustainable and high-performance materials across industries. He developed practical skills in machine learning, mastering basic techniques like regression and classification while beginning to explore advanced concepts such as regularization and dimensionality reduction. Through regular meetings and workshops, he improved his presentation and communication skills, learning to express ideas clearly in a research setting. Ryan also learned how to initiate and manage a research project, from defining objectives to developing impactful solutions. Importantly, he took his first steps into computational research, gaining confidence in coding and working within a collaborative research environment.

Modelling Phase Transitions: Characterising Henry’s Law

Josh Cheung
Host University: University of Southampton
Host Academic: Dr. Joanna Grundy, Prof. Jeremy Frey

This project main focus was to use machine learning models to predict values for solubility (log S) and Henry’s law constant (kH), exploring the links between the two properties. This was completed via data curation to create a combined dataset, followed by data processing, training and selection of machine learning models, and analysis of predictions against a withheld test set.

Two machine learning models were successfully developed to predict solubility and Henry’s law constant, trained on large datasets and tested on a shared subset of 2,563 datapoints. Both models achieved acceptable accuracy, with around 50% of predictions scoring below 1 for mean squared error (MSE), and the Henry’s law model performed exceptionally well for hydrocarbons. However, due to time constraints, the analysis of feature importance was not completed, leaving room for future investigation using techniques like recursive feature elimination and SelectKBest. A number of data quality and modelling flaws were identified, highlighting opportunities for improvement in data sanitation, feature engineering, and model refinement. The project lays a strong foundation for further development and analysis in predictive modelling of chemical properties.

The project provided Josh with hands-on experience in data curation, model training, and result evaluation. They also gained valuable transferable skills in project organisation, time management, and scientific reporting. Importantly, this experience deepened their understanding of machine learning in physical chemistry and helped guide their career aspirations towards computational and data-driven research.

A Computer Vision Approach to Understanding Polymer Swelling Kinetics

Alex McKissock
Host University: University of Strathclyde
Host Academic: Dr. Marc Reid

This project focused on studying gallium deformation in acidified salt solutions and analyse the movement via Kineticolor computer vision software.

Alex’s internship project successfully adapted Kineticolor computer vision software to investigate gallium metal locomotion, building on recent research in liquid metal chemistry. He developed expertise in computer vision analysis by studying published examples and producing high-quality experimental videos using prepared salt solutions. His findings revealed distinct stages of gallium deformation and showed that acidity accelerates the process, with Galinstan breaking down in acidified silver nitrate. The project demonstrated the potential of computer vision in studying dynamic materials and highlighted its relevance to soft robotics and chemistry education.

By working in a research lab, Alex gained practical skills in experimental design, column handling, and understanding the selectivity of various amine sensors. He also learned to optimise camera and lighting setups to overcome challenges posed by reflective liquid metals, and explored the versatility of Kineticolor’s analysis methods. The project reinforced key chemistry concepts such as redox reactions, electrochemistry, and precipitation, deepening his theoretical and practical understanding.

A Fast and Justifiable AI approach to characterising hydrogen bonds

Jack Gallimore
Host University: University of Liverpool
Host Academic: Dr. Olga Anosova

The aim of this project was to develop a more transparent and geometrically justified method for identifying hydrogen bonds in proteins, with a focus on detecting and classifying common secondary structures like helices. Unlike traditional tools such as DSSP, his approach sought to eliminate reliance on manually set thresholds, enabling robust and consistent analysis across all experimental structures in the Protein Data Bank.

At the start of Jack’s project, he explored various geometric features to reliably identify hydrogen bonds in protein structures, focusing on relationships between residues i and i+3, i+4, and i+5. After analysing distributions across the Protein Data Bank, he established a transparent rule based on oxygen–nitrogen distance and C–O–N angle, which showed strong predictive power for secondary structure formation. He plans to continue refining these thresholds with Dr Olga Anosova, aiming to publish the work later this year.

Jack learned to critically structure and produce research-grade work while handling noisy, large-scale data spanning 100,000 PDB files. He gained practical experience with Python libraries and efficient coding techniques, deepening his understanding of protein geometry and hydrogen bonding through hands-on analysis and visualization.

Computational Design of New Chiral Metal Halide Semiconductors Using High-Throughput and Machine Learning

Menna Shirras
Host University: Lancaster University
Host Academic: Dr. Nourdine Zibouche

This project aimed to design new chiral metal-halide semiconductors (CMHS) using machine learning techniques.This involved exploring chemical space with USPEX, compiling structural data, performing DFT calculations, applying supervised ML models to predict bandgaps, and refining promising candidates through high-level computations.

A curated library of 136 CMHS structures was compiled from literature and expanded through substitutions, with additional hypothetical structures attempted via USPEX, though limited by time and computational resources. Electronic data was generated using loose SCF calculations in Quantum ESPRESSO, and inconsistencies in literature data led to the use of a larger, cleaner dataset of 240 organometal halide perovskites for machine learning analysis. Three supervised regression models: Linear Regression, Decision Tree, and Random Forest, were implemented and tuned using Scikit-learn to predict bandgap values, with performance evaluated using MAE, MSE, and R² metrics.

The project allowed Menna to gain practical experience with Python libraries and machine learning algorithms, applying them to materials design and data analysis. She developed strong skills in data visualization and model evaluation, and now aims to expand her knowledge by exploring unsupervised ML techniques and applying her models to larger datasets.

Develop an AI data audit tool to help with data ingestion for the Physical Data Science Data collection

Oscar Robinson
Host University: University of Southampton
Host Academic: Dr. Matthew Partridge and Dr. Samantha Pearman-Kanza

The aim of this project was to perform an initial investigation into the development of an AI-driven data audit tool to streamline the ingestion of datasets into the Physical Chemistry Properties Data Collection (PChProp). The tool needed to be able to analyse submitted datasets, identify inconsistencies or missing metadata, and propose corrections or canonical mappings to align with community standards and support ingestion to the database where the collection is hosted.

Oscar’s project has produced a terminal-driven python script called SAM – the Solubility Audit Manager. This tool is a command line chatbot that makes use of a local HuggingFace LLM, heuristics, and NL parsing, to guide the user through ingestion stages to build a Solubility database building tool.

Additionally, he learnt how to write in a pythonic fashion, moving from experience with Jupyter Lab and .ipynb formats to multiple files assembled using PyCharm. Also gained experience applying local language models, natural language parsing, and regular expressions to import and process chemical data.

Exploring materials space with optimal transport

Zibo Zhou
Host University: University College London
Host Academic: Dr. Keith Butler

The project aimed to develop a systematic, data-driven framework for analogy-based materials discovery by integrating structural and compositional data using fused Gromov-Wasserstein (FGW) distance. This approach was specifically applied to estimate the spectroscopic limited maximum efficiency (SLME) of materials, enabling more informed predictions in materials science.

Zibo’s project demonstrated that the fused Gromov-Wasserstein (FGW) approach performs well in materials discovery tasks with small training sets, achieving results comparable to advanced models like ALIGNN and CrabNet. FGW’s predictive accuracy was highly dependent on the choice of reference material, making it especially useful when data are scarce but reliable predictions are needed. In contrast, ALIGNN and CrabNet showed superior performance as training set size increased, offering practical guidance for model selection based on data availability.

Through this project, Zibo gained a strong understanding of optimal transport theory and the fused Gromov-Wasserstein (FGW) distance, learning how it integrates structural and compositional data for materials prediction. He developed insight into the SLME property in photovoltaics and learned to evaluate model performance using various metrics across different training set sizes. His coding and computational skills improved significantly, enabling him to build efficient data pipelines and analyse large datasets. Additionally, he strengthened his collaboration and project management skills while working within a research group.

Human-AI Collaborative Closed-Loop Optimization Online Platform Development

Maxime Atkinson
Host University: University of Liverpool
Host Academic: Dr. Xenofon Evangelopoulos

The project focused on gaining an introduction to machine learning for closed-loop chemical discovery, with an emphasis on algorithmic benchmarking and orientation within the chemical discovery process.

Maxime developed and applied machine learning techniques, including Bayesian optimization and neural networks, to support chemical discovery tasks. He built and tested optimization algorithms using various surrogate models and acquisition functions on both real and synthetic datasets. Additionally, he implemented Physics-Informed Neural Networks (PINNs) for voltammetry modeling and produced functional codebases for predictive modeling, optimization loops, and data analysis in chemistry.

Through this project, he gained mastery of key tools including Git, VS Code, Sci-Kit Learn, BoTorch, Keras, and TensorFlow, while developing a deep understanding of Bayesian optimization, surrogate models, and acquisition functions and constraints. He learned to structure machine learning projects from data preprocessing to deployment and built skills in neural network architectures, ensemble methods, and Physics-Informed Neural Networks (PINNs) for scientific applications. Additionally, he gained insights into voltammetry’s chemistry and mathematics, as well as valuable experience in research collaboration and academic career development.

Impact of finer k-point sampling on vibrational free energy and crystal structure ranking accuracy

Leo Arogundade
Host University: University of Southampton
Host Academic: Prof. Graeme Day

Leo’s project built on his third-year dissertation by exploring how finer k-point sampling influences vibrational free energy calculations in crystal structure prediction (CSP) workflow. By improving the accuracy of energy data, the work sought to improve the reranking of predicted crystal structures and assess the tradeoff between computational cost and prediction reliability.

The project successfully extended the analysis to an additional 70 crystal landscapes, significantly broadening the dataset beyond the initial dissertation work. The results showed that 16 crystals experienced an improvement in their ranking. In contrast, 29 crystals saw a worsening of their ranks, while 25 crystals maintained their original rankings. This highlighted the nuanced role of vibrational energy in crystal stability, with polymorphs showing the most inconsistent rerankings due to their subtle energetic differences. These findings suggest that while refined calculations can enhance accuracy, they may also introduce variability, especially for polymorphs. Overall, the project indicated that vibrational energy metrics could complement existing CSP workflows to improve prediction reliability.

Leo deepened his understanding of computational chemistry and crystal structure prediction theory, gaining new skills in vibrational energy calculations, data analysis, and scientific programming. He became confident in handling independent research challenges and learned how theoretical models and practical data intersect in materials discovery. Working with advanced computational methods also strengthened his ability to evaluate model performance and manage complex workflows.

Machine Learning acceleration of metadynamic simulations of antimicrobial peptides

Alexandre Peuch
Host University: Imperial College London
Host Academic: Dr. Jarvist Frost

Alexandre’s project aimed to accelerate metadynamic simulations of antimicrobial peptides (AMPs) using machine learning techniques.

In the first part of the project, he developed a Monte Carlo algorithm based on amino acid coupling efficiencies to predict likely byproducts in peptide synthesis, guiding experimental efforts to identify active compounds in impure samples. The second part involved running 32 metadynamics simulations using both traditional and machine learning force fields (AMBER99SB-ILDN and Grappa) to study folding and dimerisation of four AMPs in polar and non-polar solvents. The results revealed consistent trends but also discrepancies—particularly in dimerisation—highlighting the importance of bonded parameters and the potential of ML-based force fields in capturing complex aggregation behaviours.

Alexandre developed a wide range of computing skills during his project, including proficiency with the UNIX command-line, virtual environments, Git, and high-performance computing. He gained hands-on experience with Monte Carlo simulations using Julia and Molecular Dynamics simulations via GROMACS, as well as applying cutting-edge machine learning techniques to study antimicrobial peptides. Notably, he trained neural networks to generate machine-learned collective variables for efficient biasing of MD simulations. These experiences have strengthened his interest in computational chemistry and confirmed his intention to pursue a PhD in the field.

Machine Learning Enabled Discovery of Point Defects Qubits for Quantum Technologies

Atshaam Ashraf
Host University: Imperial College London
Host Academic: Dr. Alex Ganose

The aim of this project was to use machine learned interatomic potentials (MLIPs) to screen defect complexes that are suitable for use as point-defect qubits. This screening will ensure hosts that exhibit low defect energies, long coherence lifetimes and the required symmetries for optically addressable qubits will be identified and put forward for further study by experimental and computational groups in quantum technologies.

Atshaam’s project involved calculating defect formation energies for CrAl neutral substitution defects in aluminium oxide phases using MLIPs and various computational tools. While MLIPs showed good agreement with DFT for primitive cells, they produced unphysical negative energies for defect structures, even after GGA+U corrections. This revealed limitations in the MLIP training data, which lacked accuracy for localised defect states. The findings highlight the need to train MLIPs on hybrid functional data to improve their reliability beyond bulk thermodynamics.

Through this project, he gained hands-on experience with key tools in computational materials chemistry, including the Materials Project API, doped and ShakeNBreak packages, and VASP for DFT calculations on Imperial’s HPC cluster. He learned how machine-learned interatomic potentials (MLIPs) can significantly reduce computational cost and accelerate research workflows. The project also introduced him to defect chemistry, deepening his understanding of material properties and their theoretical foundations. As his first independent research experience, it strengthened his Python, data analysis, and problem-solving skills, and confirmed his interest in pursuing a research career in atomistic simulations using machine learning.

Machine Learning Force Fields for CO₂ Adsorption and Reduction in Porous Solids

Dongin Kim
Host University: Imperial College London
Host Academic: Prof. Aron Walsh

Dongin’s project aimed to create a general, descriptor-driven framework to connect homogeneous and heterogeneous catalysis by aligning electronic properties. Using the d-band centre of a homogeneous Rh–phosphine complex as a benchmark, he designed Rh–P nanoparticles with matching surface electronic structures. This was achieved through a closed-loop workflow combining ML-accelerated molecular dynamics, DFT calculations, bonding analysis, and catalytic testing, all scripted for reproducibility and future application to other material systems.

The project found that the surface d-band centre deviation is the most reliable predictor of catalytic performance, outperforming geometric metrics. Using this insight, Rh₃P was identified as the optimal composition, surpassing previously favoured Rh₂P due to better electronic alignment with the homogeneous benchmark. Mechanistic analyses (Bader charge, XPS, COHP/ICOHP, PDOS) consistently showed that Rh₃P offers balanced bonding and near-optimal adsorbate interaction. Rh-rich compositions failed due to electronic phase separation and excessive Rh–Rh delocalisation. The team also delivered a reproducible ML-MD + DFT pipeline and proposed a transferable “electronic alignment” design principle for heterogeneous catalysts.

Dongin gained practical experience in applying AI-based molecular dynamics to a real-world catalytic design problem. She learned to use machine learning force fields (MACE and MatterSim) within ASE to run annealing–quenching simulations and validate structures before DFT analysis, deepening her understanding of the strengths and limitations of current ML potentials. She also developed skills in building reproducible computational workflows, integrating ML-MD, DFT, and electronic structure analysis into a coherent pipeline. This included automating nanosphere generation, MD runs, and figure preparation with statistical annotations. Overall, Aysel came to appreciate how AI-enabled simulations can accelerate catalyst discovery by bridging exploratory structure generation with rigorous quantum chemical analysis.

Multi-Objective Bayesian Optimisation Approach Towards Advancing Automated Liquid-Handling Platforms

Xiaojun Hu
Host University: Imperial College London
Host Academic: Prof. Becky Greenaway

The project aimed to apply Bayesian Optimisation (BO) to optimise solvent-handling parameters on the Opentrons OT-2 (robotic liquid-handling platform). A new evaluation metric, the Sum of Absolute Differences (SAD), was introduced to improve the accuracy and precision of solvent transfers. Building on this, the goal is to develop a fully automated closed-loop workflow that runs experiments with minimal human intervention. This approach enhances efficiency, throughput, and safety, marking a key step toward the next generation of chemistry laboratories.

Xiaojun conducted dispensing experiments with ethanol and chloroform on the Opentrons OT-2, identifying seven key design variables that influence solvent transfer accuracy. She introduced the Sum of Absolute Differences (SAD) as a new metric, which showed stronger correlation with ideal dispensing behaviour than traditional R² measures. Using Web-BO with a Latin Hypercube Sampling strategy, she successfully optimised ethanol dispensing within four iterations, while chloroform required further refinement due to its complex physical properties. In the final phase, she contributed to developing a fully automated closed-loop dispensing and weighing station, writing and testing custom protocols. This work lays the foundation for autonomous experimental workflows with minimal human intervention.

Through this project, Xiaojun learned to operate and troubleshoot the Opentrons OT-2, design fair performance metrics, and apply Bayesian Optimisation for experimental parameter tuning. She gained experience in collaborative coding workflows using Python, VS Code, and GitHub, and developed confidence in presenting results to a research group. Working closely with the Greenaway team, she learned to collaborate effectively and engage in a dynamic research environment. Under the guidance of doctoral researchers Sean Gurung and Alex Ostudin, and with technical support from Dr. Austin Morz, she gained valuable exposure to Web-BO. Xiaojun is especially grateful to Prof. Becky Greenaway for the opportunity to explore the field of computational chemistry.

NeuralBind: Enhancing chemical coverage and diversity of training data for binding-affinity predictions.

Savva Grevtsev
Host University: University of Oxford
Host Academic: Prof. Philip Biggin

The aim of this project was to train ML models (RFscore, EHIGN, AEV-PLIG) on various real-world and synthetic datasets, evaluate performance, infer what models learn and how well they generalise as well as the feasibility of synthetic data use in the field of binding affinity prediction..

Savva contributed as second author to a published paper (arXiv:2507.07882) and made significant contributions to multiple codebases. The project revealed that both synthetic and real-world datasets in molecular modelling suffer from inconsistent quality, and many commonly used benchmarks and tools are flawed. Despite advances in model architectures, performance has largely plateaued, with GNNs showing modest improvements due to better inductive biases. While synthetic data can help, stringent quality control is essential, large volumes of mediocre data offer little benefit. The findings suggest that future progress in the field will depend heavily on large-scale, high-quality data efforts, unless a new modelling paradigm emerges.

Savva gained deeper familiarity with Python packages, the Git version control system, and working with remote HPC clusters, while training and applying machine learning models, primarily random forests and graph neural networks, for binding affinity scoring. She developed skills in dataset curation and model performance evaluation, which are essential in bio/cheminformatics. The project also highlighted the widespread issue of poorly documented and unreliable codebases in the field, which he found frustrating, though noted gradual improvements. Overall, the experience broadened his technical capabilities and reinforced the importance of clean, maintainable code in computational research.

Towards the Development of Asynchronous Solvent Handling Capabilities for Automated Liquid Handling

Jessica Lai
Host University: Imperial College London
Host Academic: Prof. Becky Greenaway

This project focused on improving the accuracy of the Opentrons OT-2 robotic liquid handler when dispensing volatile organic solvents, which often drip due to pressure build-up from water-based calibration. To tackle this, a custom function introducing an additional air-gap step was developed to mitigate dripping. This function was integrated into a closed-loop Bayesian optimisation workflow, allowing automated optimisation of pipetting parameters across various solvents. The approach reduces human bias, time, and labour compared with traditional manual optimisation.

Jessica successfully developed a custom midpoint function for the Opentrons OT-2, which introduced an air-gap step between the pipetting origin and destination to reduce solvent dripping. This function was optimised for volatile and heavy solvents like DCM, Et₂O, and CHCl₃, resulting in lower standard deviations and improved linearity in dispensing accuracy. She implemented a Web-based Bayesian optimisation (Web BO) workflow to fine-tune seven key pipetting parameters, using defined ranges and iterative feedback. A fully automated closed-loop system was established, integrating the OT-2, a DOBOT robotic arm, and a weighing balance to execute and evaluate dispensing tasks autonomously. The midpoint function enhanced the precision of the optimisation process, supporting more reliable and consistent solvent handling.

Through this project, Jessica strengthened her Python programming skills through troubleshooting and developing a custom midpoint function, gaining confidence in error handling and collaborative coding. She expanded her technical knowledge by working with Linux, GitHub, and PowerShell, and learned to adapt her code within a team setting. The project introduced her to the Opentrons OT-2, where she explored its hardware and software architecture, including low-level API control. She also gained a solid understanding of Bayesian optimisation and its application in refining experimental conditions efficiently. Beyond technical skills, Jessica improved her presentation abilities and developed a deeper appreciation for consistency and teamwork in a research environment.

Transforming Undergraduate Learning. Migration of a Student Computational Drug Discovery Assignment to a Python Based Project.

Zhaohui Jiang
Host University: Manchester Metropolitan University
Host Academic: Dr. Alex Aziz

Zhaohui’s project aimed to transition a second-year computational drug discovery assignment from SCIGRESS, a proprietary tool, to an open-source Python-based workflow using Jupyter Notebook. The goal was to equip students with practical Python programming and data analysis skills while engaging them in virtual drug screening tasks. Using libraries like NumPy, Pandas, Matplotlib, and Scikit-Learn, the workflow introduces QSAR modeling, Lipinski’s Rule of Five, and IC₅₀ data analysis for HIV-1 protease compounds. He developed interactive teaching materials structured for delivery over five sessions, ensuring accessibility and reproducibility. Ultimately, the project supports modern, accessible computational chemistry education using open tools and real-world datasets.

He successfully developed a Python-based virtual drug screening pipeline, retrieving and cleaning IC₅₀ data for HIV-1 protease from ChEMBL, calculating molecular descriptors with RDKit, and applying Lipinski’s Rule of Five for initial filtering. She built and evaluated QSAR models using Scikit-Learn, finding that random forest regression outperformed linear regression in predictive accuracy. This model was then used to screen compounds from the SWEETLEAD database, identifying the top ten potential inhibitors. On the teaching side, Zhaohui created interactive Jupyter Notebook tutorials covering molecular visualization, cheminformatics, and machine learning, enhanced with ipywidgets and matplotlib. These deliverables provide a complete research-to-teaching workflow, supporting future interdisciplinary education in AI-driven drug discovery.

Through this project, he developed both technical and conceptual skills in applying Python and AI to chemistry. They learned to visualise and analyse chemical data using tools such as Py3Dmol, ChEMBL, and RDKit, exploring the relationship between molecular properties and pharmacological activity. Using Scikit-Learn, they built and compared QSAR models, gaining insight into model performance, feature selection, and data quality. Additionally, by creating interactive Jupyter Notebook tutorials, they enhanced their ability to communicate complex research processes and deepened their understanding of how AI can advance chemical research.

Benchmarking LLM-driven Scientific Reasoning in Closed-Loop Experiments

Maxime Atkinson
Host University: University of Liverpool
Host Academic: Dr. Olga Anosova

This project will evaluate whether large language models can perform as well as, or better than, established optimisation methods in autonomous materials discovery. Using existing photocatalytic hydrogen production data, the project will create a virtual benchmark environment in which different discovery strategies can be tested repeatedly under controlled and reproducible conditions.

The internship will focus on building a calibrated machine learning “virtual oracle” and then comparing Bayesian optimisation, hybrid Bayesian–LLM approaches, human-in-the-loop methods, and LLM-only strategies. The resulting statistical analysis will provide evidence on the strengths and limitations of LLM-based scientific reasoning in closed-loop discovery workflows.

Data-driven approaches for solubility prediction in the material sciences

Adithi Rajesh
Host University: Imperial College London
Host Academic: Prof. Kim Jelfs

This project will explore how computational chemistry and machine learning can be used to predict the solubility of small organic molecules and polymers, a key challenge in materials discovery. By calculating and comparing molecular descriptors against experimental datasets, the project will investigate which features best explain solubility behaviour and how predictive models can support the design of more processable materials.

The work is expected to generate new datasets and predictive models, while also examining the interpretability of these methods to provide physical insight rather than black-box predictions alone. In doing so, the project will contribute to efforts to strengthen the link between computational materials design and successful experimental outcomes.

Disentangling Operando Spectroscopy with Deep Learning

Meng Ip Liu
Host University: University College London
Host Academic: Dr. Keith Butler

This project will use deep learning to analyse complex operando spectroscopy data from electrocatalysis experiments. These datasets often contain overlapping signals from multiple species and processes, making it difficult to identify reaction intermediates or mechanistic changes using conventional approaches. The project will address this by applying disentangled autoencoders to learn compact and interpretable representations of the data.

By separating hidden factors such as composition, oxidation state, or local reaction environment, the project aims to reveal chemically meaningful trends and identify unusual or short-lived spectral signatures. This could provide a powerful new way of turning large, complex spectroscopy datasets into mechanistic understanding.

Domain-Specific AI for Experimental Data Analysis in Virtual Reality Laboratory Environments

Eima Miyasaka
Host University: University College London
Host Academic: Prof. Stephen Hilton

This project will develop a domain-specific AI system for analytical chemistry within a virtual reality laboratory environment. Building on UCL’s Lab 427 digital twin platform, the project aims to move beyond conversational guidance by creating an AI tool capable of performing meaningful analysis of experimental outputs, such as chromatographic data and flow chemistry parameters, directly within VR.

The work will involve assembling a chemistry-focused training corpus, fine-tuning an open-source language model using efficient methods, and deploying the resulting system as a cloud-based inference tool. Its performance will then be assessed against general-purpose AI models on chemistry-specific analytical tasks, helping to define how specialised AI can enhance immersive scientific training.

Interpretable discovery of functional organic molecules for nanotechnology applications

Oliver Jarvis
Host University: University of Warwick
Host Academic: Assoc. Prof. Zsuzsanna Koczor-Benda

The project will investigate how interpretable machine learning can support the discovery of organic molecules for nanotechnology applications such as thermoelectrics, light detection, and bioimaging. While generative AI can propose new candidates, many suggested molecules are too complex, unstable, or impractical for experimental use. This project will focus on extracting design rules and structure–property relationships that can guide more realistic molecular design.
Using supervised learning and statistical analysis, the work will interpret outputs from generative models and test whether the learned rules can be applied to improve molecules already known in related applications. The project aims to deliver Python-based tools for analysing generative design workflows and identifying promising, experimentally relevant candidates.

Large Language Models for Generating a Universal Calibration Curve for Polymer Diffusion

Henry Yarley
Host University: University of Manchester
Host Academic: Dr. Robert Evans

This project will investigate whether large language models can be used to extract reliable polymer diffusion data from the scientific literature and turn it into a high-quality dataset for model development. By combining careful prompt engineering with validation of extracted values, the project aims to overcome longstanding challenges in gathering consistent diffusion data across polymers and solvents.

Using this literature-derived dataset, the project will develop models linking polymer molecular weight to diffusion coefficients, with the goal of producing a more universal calibration curve for polymers in solution. If successful, the work could demonstrate a new role for LLMs in rigorous scientific data extraction while also advancing polymer chemistry research.

Large Language Models vs Bayesian Optimisation: Toward Self-Driving Electrolyte Discovery

Sri Nikesh Kamma Chaval
Host University: Imperial College London
Host Academic: Dr. Emma Antonio

This project will compare two AI-driven strategies for battery electrolyte optimisation: conventional Bayesian optimisation and a large language model-based approach. Working within Imperial’s DIGIBAT high-throughput facility, the project will combine automated experimentation with algorithmic decision-making to explore the complex formulation space of electrolyte systems for coin cell batteries.

After initial training and benchmarking of baseline cell performance, the project will implement both optimisation frameworks and apply them iteratively to improve electrolyte compositions for a chosen electrode chemistry. The work is expected to generate insight into the effectiveness, robustness, and practical limitations of different AI approaches for experimental battery research, while contributing to the development of self-driving laboratories.

Machine learning enabled lithium-ion electrolyte chemical composition identification through Electrochemical Impedance Spectroscopy (EIS) data

Fuzhi Li
Host University: Imperial College London
Host Academic: Dr. Derek Siu

This project will develop an AI-driven framework to identify lithium-ion battery electrolyte compositions from Electrochemical Impedance Spectroscopy data. Since electrolyte formulations are rarely disclosed by manufacturers, this work addresses an important information gap by combining battery testing, electrochemical analysis, and machine learning to connect EIS signatures with electrolyte–electrode interactions.

It will build a benchmark database of battery cells with known components and use this to train models capable of recognising electrolyte formulations and predicting performance-related properties. In the longer term, the work could help guide the design of new electrolyte compositions tailored to specific battery requirements.

Machine Learning for Molecular Mechanics

Tobias Liang
Host University: University of Edinburgh
Host Academic: Prof. Julien Michel

This project will investigate how machine learning potentials can be combined with molecular mechanics to enable accurate yet computationally efficient atomistic simulations. In particular, the work will explore electrostatic machine learning embedding, a method that captures polarisation effects and could improve simulations of biomolecular systems beyond what is possible with standard force fields.

It will also benchmark different machine learning potential architectures on datasets such as FreeSolv and QM7, assessing their speed, numerical stability, and accuracy in predicting hydration free energies. It will also examine whether embedding parameters learned with one model can transfer effectively to another, contributing to the longer-term goal of developing more universal hybrid simulation methods.

Machine Learning Prediction of Absorption Spectra of Photoswitches for Solar Thermal Fuels

Doina Glavnenco
Host University: Queen Mary University of London
Host Academic: Dr. Federico Javier Hernández

This project will apply machine learning and computational photochemistry to predict the absorption spectra of molecular photoswitches for solar thermal fuel applications. Rather than focusing only on absorption maxima, the work will aim to model the full spectral shape and intensity, which are critical for designing efficient molecules capable of storing solar energy.

Starting from a large library of norbornadiene derivatives, the project will combine conformational sampling, quantum chemistry, and atomistic machine learning to build predictive models for excitation energies and oscillator strengths. The best candidates will then be assessed further for energy storage performance and stability, helping to accelerate the discovery of next-generation solar thermal fuel materials.

Multi-property peptide optimisation: from design to lab screening

Emilia Ute Marlene Fetzer
Host University: Imperial College London
Host Academic: Assoc. Prof. Anna Barnard

This project will address a major challenge in peptide drug discovery: designing molecules that not only bind effectively to biological targets but also possess favourable developability properties such as solubility and stability. The project will combine AI-guided peptide design with experimental testing to generate high-quality datasets that can be used to evaluate and improve predictive models.

Peptides will be synthesised and characterised using a range of automated and analytical techniques, creating a robust experimental foundation for benchmarking AI methods and identifying candidates with co-optimised properties. This work will help bridge the gap between computational design and experimentally viable therapeutic molecules.

2024

Accelerating Materials Discovery: Integrating Machine Learned Force Fields (MLFF) with Monte Carlo Simulations

Jay Zhou – University of Bath
Steve Parker – University of Bath
Tom Underwood – STFC, RAL

Project Summary

Recently, the use of Machine Learned Force Fields (MLFFs) for Molecular Dynamics (MD) simulations have popularized. However, this is not the case for Monte Carlo (MC), which performs exceptionally well for systems such as gases, complex mixtures and adsorptions. In fact, the machinery for MLFF integrated MC is currently absent on open-sourced platforms. Therefore, the goal of this project is to develop an open-sourced software framework that integrates MLFFs with MC simulations to enhance the calculation of various thermodynamic properties of solid-state materials. By leveraging MLFFs trained on ab-initio data, we aim to improve the accuracy of MC simulations in predicting precise free energy properties, while offering significantly faster compute speed than ab-initio methods.

This approach addresses the limitations of classical force fields (CFFs), which are traditionally used to describe the interactions between atomic and molecular species in MC simulations but often fall short in accuracy. With this project, we commit to program a reliable interface between the Monte Carlo simulation software package DL_MONTE (a member of the Daresbury Lab software suite) and MLFFs in the form of universal Python functions (callable by ASE). We will also engage in extensive testing of the said interface, by using the CHGNet Neural Network Force Field to calculate thermodynamic properties of water via Grand Canonical Monte Carlo (GCMC). The final code package will be released open-sourced alongside detailed documentation and tutorials to attract community engagement and fast track the adoption of AI in the MC simulation community.

Project Outcomes

The project delivered a functional interface linking machine-learned force fields with Monte Carlo simulations via the DL_MONTE platform. This enables ab initio-level accuracy for simulations of gases and materials at significantly reduced computational cost. The developed server-client framework is flexible, scalable, and compatible with multiple MLFF tools, with open-source code and documentation provided. This represents a key step towards broader adoption of ML-driven simulation techniques in materials chemistry.

Next Steps

The team plans to publish the methodology and improve computational efficiency by securing additional GPU resources. Future work will focus on overcoming current performance limitations to enable more accurate and scalable simulations.

AI-Enabled Prediction of Lipid Membrane Composition from Optical Signatures

Dr. Miguel Paez Perez – Imperial College London
Prof. Marina Kuimova – Imperial College London

Project Summary

Lipid membranes play a key role in biology; they provide structural integrity, are central in intercellular communication, control content exchange, and transduce extracellular signals. Their functionality arises from their unique structure and biophysical properties, which are dictated by the interaction between the membrane’s constituent lipid molecules. Deregulation of these lipid-lipid interactions has been linked to diseases including cancer, malaria, Alzheimer, or atherosclerosis. From a commercial perspective, membrane composition and lipid-lipid interactions influence the efficacy of lipid-based drug carriers, such as miRNA vaccines. Yet, there is a limited understanding on how the lipid composition of complex, multi-component membranes, affects their biophysical behaviour.

To address this challenge, we are developing tools capable of monitoring the biophysical features of lipid bilayers with high throughput and low-cost. We will leverage on an in-house high-throughput vesicle production device, optical readouts, and AI tools to unlock an otherwise inaccessible insight into how the chemistry of complex lipid membranes dictates its biophysical properties.

This project will generate a publicly available, curated dataset to facilitate the development of data-driven biophysical models, and we anticipate its outputs will find applications in areas including antimicrobial resistance research or artificial cell development, supporting key UK strategic areas like Engineering biology.

Project Outcomes

This project established a proof-of-concept for “optical lipidomics,” combining fluorescence-based sensing with AI to classify lipid membranes. By using multiple environmentally-sensitive dyes, the team generated unique fluorescence fingerprints for biological samples, including cancer-derived vesicles. A novel vesicle production technology (“Spinlex”) was also developed to accelerate data generation. The project led to early-stage funding success and supported significant researcher upskilling in AI methods, alongside progress towards patenting the technology.

Next Steps

The team is expanding collaborations to apply this approach to antimicrobial resistance and cancer diagnostics. Future plans include pursuing funding from BBSRC, CRUK, and UKRI, as well as supporting fellowship applications to further develop and translate the technology.

AI-Enhanced Molecular Dynamics: Integrating Long-Range Interactions with Graph Neural Networks

Dr. Devis Di Tommaso – Queen Mary University of London
Assoc. Prof. Rachel Crespo-Otero – University College London
Prof. Greg Slabaugh – Queen Mary University of London

Project Summary

Molecular Dynamics (MD) is an essential computational tool for studying atomistic-level phenomena. However, methods like ab initio MD, which rely on density functional theory (DFT) to compute energies and forces, are computationally expensive. On the other hand, methods based on classical interatomic potentials (IP) offer speed but lack flexibility. In recent years, machine learning (ML) approaches promise DFT-quality simulations at faster speeds but often neglect long-range interactions, which are crucial for accurately describing systems such as liquids, gas-solid, liquid-solid, and biomolecular systems. This project aims to develop AI-enhanced MD methods that integrate long-range interactions for predicting both energies and forces. The outcome will be the first version of a code implementing MGT to partially or fully replace the costly and time-intensive traditional computational methods used at each timestep of MD simulations. The model will compute atomic forces, energies, and changes in atomic positions throughout the simulation, enabling a more efficient and scalable approach to studying molecular systems. This will facilitate accelerated atomistic MD simulations that account for both local and long-range interactions.

Project Outcomes

The project developed a novel equivariant graph neural network architecture (BAEE) that integrates long-range electrostatic interactions into molecular dynamics simulations while maintaining computational efficiency. This represents a significant methodological advancement beyond the original Molecular Graph Transformer concept. A manuscript detailing the framework is in preparation, with planned submission in early 2026. The work also strengthened interdisciplinary collaboration and upskilling across AI and computational chemistry, with foundations laid for future open-source release.

Next Steps

The team will further develop and benchmark the method across diverse datasets, followed by open-source dissemination. Plans include submitting an EPSRC proposal and advancing the GRIP framework for inverse materials design in collaboration with industry partners such as IBM.

Bayesian Optimization for Accelerating Metal-Based Antibiotic Discovery

Dr. Angelo Frei – University of York
Dr. David Husbands – University of York
Dr Athi Welsh – University of York

Project Summary

Antimicrobial resistance to current treatments poses a growing threat to global healthcare. At the same time the antibiotic development pipeline remains perilously stagnant. This project aims to accelerate the discovery of novel metal-based antibiotics by integrating machine learning (ML) with high-throughput chemical synthesis and biological evaluation. Collaborating with Atinary Technologies, we will leverage Bayesian Optimization to train ML models with our chemical libraries to predict and iteratively refine iridium(III) metalloantibiotics, maximizing antibacterial potency while minimizing toxicity.

Metal-based antibiotics offer unique structural and functional advantages over organic compounds, yet their discovery remains slow, partially due to the vast chemical space available. Traditional methods of structure-activity relationship elucidation are time-intensive and inefficient. By training a ML model on 1440 iridium(III) complexes, we will virtually screen ~400 million potential compounds from combinations of building blocks, dramatically enhancing the speed of hit identification and the hit-rate. From this virtual screen, two iridium(III) libraries will be synthesized and evaluated using an automated Opentrons system.

Overall, this project is anticipated to yield the following outcomes:

Identification of 10 lead iridium(III) complexes with high antibacterial activity and low cytotoxicity for further biological evaluation.
Development of an ML-driven approach to optimize metal-based drug discovery.
Strengthening of industry-academic collaboration with Atinary Technologies.
Publication of research findings to advance AI-driven drug discovery. This project leverages the use of AI for chemical innovation, setting the stage for future applications in metallodrug development for other ailments.

Project Outcomes

This project successfully applied AI-driven Bayesian optimisation to explore a vast chemical space of iridium(III) complexes for antibiotic discovery. In collaboration with Atinary, the team developed a novel predictive model trained on bioactivity data, enabling efficient screening across 400 million potential compounds. Two iterative design cycles led to the synthesis and evaluation of 360 new complexes, identifying several promising candidates with strong antibacterial activity and acceptable toxicity. The project also provided valuable upskilling opportunities through direct engagement with an AI industry partner. A research publication is currently in preparation.

Next Steps

Future work will focus on expanding datasets to enable predictions across a wider range of metals and improving model robustness. The team is exploring further funding opportunities, including ERC grants, to continue collaboration with Atinary and advance this approach.

Computer Vision for Predicting the Impact of Additives in Protein Crystallisation

Prof. Bao Nguyen – University of Leeds
Dr. Briony Yorke – University of Leeds
CaiYun Ma – University of Leeds
Dr. Halina Mikolajek – Diamond Light Source

Project Summary

The rise of new drug discovery modalities has underscored the need for efficient macromolecular crystallization, both as a key characterization technique and as a greener alternative to traditional purification methods in manufacturing. However, the weak intermolecular interactions in protein crystals often lead to instability, necessitating the use of additives that influence protein binding, ionic strength, or nucleation. While standardized crystallization screens are widely used, the underlying intermolecular interactions and nucleation/growth mechanisms remain poorly understood, with only limited systematic studies on a small subset of proteins.

This project aims to address these challenges using AI-driven computer vision to classify and extract morphological data from microscopic images of protein crystallization screens, sourced from the VMXi beamline at Diamond Light Source. The outcome will be an automated workflow for crystal characterization across diverse sources, a robust dataset of microscopic images, including negative results and AI models trained to predict optimal crystallization conditions and additives. By improving crystallization success rates, this approach advances both protein characterization and scalable purification. Furthermore, the same AI-assisted image processing techniques can be extended to other materials, broadening their impact beyond biomolecular systems.

Project Outcomes

This project focuses on improving protein crystallisation analysis using artificial intelligence, through two main stages: data curation and model development. In the first stage, we curated and annotated over 3,000 microscopy images from protein crystallisation experiments at Diamond Light Source. Each image was labelled with detailed experimental metadata and crystal characteristics, creating one of the most comprehensive datasets of its kind. This dataset is available to the community, and will be open-access when the manuscript is published.

In the second stage, the dataset was used to train and evaluate AI models for crystal detection. Initial results are promising, achieving 83% classification accuracy. More advanced approaches using large-scale transfer learning were explored but require further computational resources to complete.

Next Steps

Looking ahead, we are seeking additional HPC resources to continue model development and improve performance. Collaborations with AI experts and partners are in place to support the next phase, which will focus on developing more advanced models to predict crystallisation success based on protein structure and experimental conditions.

Further work will also involve engagement with industry partners to expand the project and accelerate real-world application. Additional publications and continued development are planned.

Enabling Data-Driven Discovery and Reaction Optimisation in Porous Organic Cage Synthesis

Dr. Benjamin D Egleston – Imperial College London
Dr. Rebecca Greenaway – Imperial College London

Project Summary

Porous Organic Cages (POCs) are a class of molecular materials with tunable micropore structures that offer significant potential in separation technologies. Recently, our lab has been implementing machine learning tools to assess the accessibility of POCs by encoding chemists’ intuition (doi.org/10.1021/acs.jcim.1c00375). However, traditional methods for synthesising and analysing these materials reaction mixtures are limited due to the complexity of self-assembly and the unintuitive nature of species formed. To address this challenge, the project will integrate robotic liquid handling and parallel synthesis with automated data processing and analysis to enable generation of large experimental datasets – providing structural information and thermodynamic data for these complex systems for machine learning or data-driven applications.

Building on recent progress in automated high-throughput screening for combinatorial synthesis of metal-organic cages (doi.org/10.26434/chemrxiv-2024-hl427-v4) and POCs (doi.org/10.1039/D3SC06133G), the project extends these methodologies to even more complex systems. The project will be centred around identifying unintuitive structures and intermediates in reaction mixtures using generated large libraries of potential molecules for identifying in mass spectrometry (MS) data. This will be combined with automated kinetic sampling and analysis of MS data in parallel reactions to enable mapping of entire reaction spaces.

One key goal of this project is to demonstrate that automation of the discovery process, from reaction preparation to data interpretation, can accelerate the identification of novel materials. Generation of much greater volumes of detailed data will allowing for a deeper understanding of these complex systems. The resulting data-driven foundation will accelerate discovery of novel POCs and other structures that are challenging to predict using traditional intuition.

Project Outcomes

This project advanced high-throughput experimentation for porous organic cage discovery by developing automated LC-MS workflows and data analysis tools. Two Python-based tools enabled efficient analysis of complex reaction mixtures and are already being adopted within the research group. The work facilitated large-scale screening of ~1,000 reaction combinations, leading to the identification of novel cage structures and supporting upcoming publications. The project also introduced automated sampling methods to generate time-resolved datasets, significantly enhancing data-driven discovery capabilities.

Next Steps

Future efforts will focus on integrating these workflows into self-optimising experimental platforms, including Bayesian optimisation approaches. Promising cage candidates will be scaled up and further characterised, alongside continued development of predictive models and collaborative research activities.

Exploration of defect superstructure phase diagrams in graphene with Bayesian AI

Dr. Lukas Hoermann – University of Warwick
Prof. Reinhard J. Maurer – University of Warwick
Dr. David Andrew Duncan – University of Nottingham
Dr. Alexander Saywell – University of Nottingham
Dr. Christopher Allen – University of Oxford

Project Summary

The atom-scale design of two-dimensional materials, particularly defective graphene, shows great promise for catalysis, sensing, and energy storage. By integrating experimental growth and analysis with Bayesian-AI-enabled configuration space prediction via the SAMPLE code, we will lay the groundwork for the experimental design to efficiently explore the phase diagram of defective graphene. This project will uncover how experimental parameters—temperature and gas flux—influence the formation of defect superstructures in graphene that govern its electronic and mechanical properties.

Using SAMPLE, we will generate a comprehensive phase space of hundreds of millions of defect superstructures and efficiently predict their formation energies. This dataset will be available on the NOMAD database. We will calibrate the theoretical phase diagram using TEM and AFM images of N-defects in graphene from our collaborators David Duncan, Alexander Saywell (University of Nottingham), and Christopher Allen (Diamond Light Source, University of Oxford). By mapping the experimental structures with a tessellation code (Duncan and Saywell) and computing their formation energies with SAMPLE, we will place these structures within the phase diagram. Using Bayesian-AI, we will learn the functional dependence of the N-concentration and defect composition on the deposition temperature and gas flux during sample preparation. This will enable the prediction of defect patterns at a given deposition temperature and guide future experiments to achieve graphene layers with targeted defect superstructures. The developed approach will be broadly applicable to any defective two-dimensional material or surface, offering a versatile framework for precision surface engineering in a range of applications.

Project Outcomes

We generated comprehensive datasets of defective graphene superstructures, including graphitic N- and B-defects as well as tripyridinic N-defects. While our focus was on graphene with N heteroatom defects, the methodology was designed to serve as a versatile framework for engineering defects in any 2D material or surface, opening new avenues for precision surface design. By leveraging Bayesian Optimisation methods, trained on DFT formation energies, we successfully predicted the formation energies of approximately 160 million structures across the aforementioned systems.

The dataset is publicly accessible on the NOMAD database.

The work initiated by this project has continued beyond its conclusion, and the results have now been published on arXiv.

Next Steps

Inverse design of growth – Based on the SAMPLE approach, we aim to develop user friendly approach to predict experimental conditions, such as temperature, pressure, and chemical environment, needed to grow graphene with specific defect properties. This inverse design approach has the potential to replace trial-and-error with computational guidance of growth experiments.

Learning the Hamiltonian of N-doped graphene – We have already started to develop a data-driven approach to learn the electronic Hamiltonian for N-doped graphene, using DFT and the MACE-H neural network potential. We aim to model how defects influence the material’s electronic and spectroscopic properties.

Learning forces without forces – We are testing whether the MACE-MP0 foundation model can be retrained to predict accurate atomic forces by using only our SAMPLE formation energy database.

High-Throughput Data-Driven Electrolyte Design to Enable Lithium Metal Batteries

Dr. Neubi Xavier – University of Surrey
Dr. Matthias Golomb – University of Surrey

Project Summary

Rechargeable batteries are a major part of our everyday lives and improving them further is crucial for future technology. The gold standard for the high-performance next-generation batteries is the use of lithium metal as the anode material. One of the major bottlenecks to enabling lithium metal batteries is the increased reactivity between current electrolyte formulations and lithium, leading to uncontrollable side reactions during operation and ultimately causing battery failures.

Researchers are currently focusing efforts on engineering new electrolyte formulations, leading to hundreds of scientific papers being published each week. The amount of data generated makes it impossible for a single researcher to follow all available literature and hinders the rational design of new electrolytes.

In this project, Dr. Neubi Xavier and Dr. Matthias Golomb aim to collate this vast amount of data into an accessible database that will establish clear reporting standards and serve battery scientists, computational chemists, and AI researchers as a starting point for further experimental and computational investigations. Using large language models, they aim to extract property information on lithium metal electrolytes from a wide range of available scientific literature and identify common core descriptors for high-performing candidates. In addition, they will combine high-throughput atomistic simulations and machine learning to fill gaps in the resulting database, aiming to create the most complete and standardized picture of the lithium metal-compatible electrolyte research landscape to date.

Project Outcomes

We have developed and released two open-source software tools to support data-driven materials research. The first enables automated extraction of scientific data from literature using large language models, making it broadly applicable across research fields.

Click here to access the software tool.

The second streamlines the setup of electrolyte simulations, significantly reducing time and potential errors by simplifying input requirements. https://github.com/neubifx/Battflow

Click here to access the software tool.

Alongside these tools, we are creating a standardised dataset that combines experimental data with computational results generated through our automated workflows. A dedicated web interface is also in development to provide access to this resource.

Next Steps

We have approached and established collaboration with industry experimental partners to verify the predictions of the database analysis. Future visits to their premises has been arranged to discuss codes and machine learning predictions of novel electrolytes generated using the data from this project, as well as to provide training for engineering staff. Furthermore, we are in the process of deploying a web interface for the standardization of low-level computational electrolyte analysis, making it accessible to experimental researchers in order to support the standardized reporting of computational results in electrochemistry.

Transforming Chemistry Labs with Safe and Intuitive Human-in-the-Loop Robotic Systems

Dr. Luis Figueredo – University of Nottingham
Dr. Ayse Kucukyilmaz – University of Nottingham
Dr. Gabriella Pizzuto – University of Liverpool

Project Summary

This project aims to transform chemistry labs through robotics and AI—overcoming adoption barriers while enhancing safety and efficiency. We’ll develop a framework that empowers chemists to intuitively teach robots experimental tasks via multimodal demonstrations, eliminating the need for programming expertise while ensuring stringent safety for seamless human-in-the-loop (HIL) operation. Our approach leverages generative AI for semantic scene understanding, grounded in model-based representations to enhance explainability and safety. This enables robots to interpret dynamic lab environments and manipulate glassware and hazardous substances. A certified safety layer ensures compliance with strict standards, advancing HIL automation in chemistry and aligning with AIchemy’s mission to foster intuitive, high-trust robotics in scientific research.

The automation of chemistry labs remains challenging due, among other reasons, to the requirements for precise and safe manipulation of hazardous substances, diverse glassware, and evolving experimental setups. Traditional robotic solutions require extensive programming expertise, limiting accessibility. Our approach leverages multimodal human demonstrations—combining kinaesthetic, visual, and haptic inputs—to develop constraint-based robotic behaviours that chemists can intuitively guide. Certified safety layers ensure secure robotic handling of hazardous liquids, enabling seamless human-robot collaboration in high-stakes lab environments.

Project Outcomes

This project demonstrated that our SpillNot inspired trajectory optimisation approach, originally developed for liquids, can be successfully extended to granular materials, reducing spillage during robotic transport by up to 85%.

Using a robotic arm system, we tested a range of common granular materials with different flow properties. Across all cases, the optimised control method significantly reduced material loss compared to standard motion planning approaches, typically achieving reductions of 50–80%. These results show that effective spill reduction can be achieved even without detailed models of granular dynamics.

In addition to these experimental results, the project delivered a reusable software and experimental pipeline for safe robotic handling of granular materials. The system was tested across multiple research sites, demonstrating strong reproducibility and potential for broader adoption in automated laboratory environments. A prototype virtual environment was also developed to support human-in-the-loop control, including early work on AI-driven safety features.

The project has also contributed to skills development, training researchers in safe robotic automation and advanced control methods. Methodological documentation, code modules, and the granular-transport dataset will form the basis for a first joint conference submission and a video demonstrator showcasing the collaboration and experimental findings.

Next Steps

Looking ahead, the next phase will expand the range of materials and experimental conditions, explore more complex and safety-critical scenarios, and extend the framework towards controlled dispensing of solid materials. Further development will also focus on intelligent, user-guided robotic systems, integrating AI to improve safety, usability, and interaction in laboratory settings.

These efforts will underpin a series of planned publications and future funding proposals, while strengthening collaboration in the development of safe, AI-driven laboratory automation.

X-GAMES: Crystallography with Machine Learning

Prof. Craig Butts – University of Bristol
Dr. Calvin Yiu – University of Bristol

Project Summary

We will build a proof-of-principle generative AI tool – X-GAMES – to identify chemical structures directly from powdered samples by combining NMR spectroscopic and X-ray diffraction data. This is of significant value to pharmaceutical industry, where the chemical structure of molecules, and their packing in crystals controls their drug properties.

Existing generative AI methods are very good at creating a myriad of images or text on a generalised subject and can also be taught to create molecules that fit broad characteristics, e.g. “make me a molecule that might be drug-like”. However, generative structure determination is a much harder challenge – as it requires generating the one-and-only chemical structure that fits uniquely to a particular set of spectroscopic data. At Bristol we have developed early prototype systems capable of doing this albeit only for molecules with a few atoms – based on ‘inverting’ a neural network version of our existing IMPRESSION machine learning architecture that was designed to predict solution state NMR parameters. Essentially this prototype predicts structures from spectra, rather than spectra from structures.

The goal of this X-GAMES project is to build out this prototype so that it works for larger, more complex drug-like molecules. To achieve this, we will train X-GAMES on 10-100x larger datasets which integrate Xray diffraction data as well as the NMR data that our prototype is already designed to use.

Project Outcomes

We have made strong progress in developing innovative tools that combine advanced spectroscopy with artificial intelligence. Our team successfully created prototype methods integrating solid-state NMR data with machine learning predictions, alongside exploring new ways to represent structural information from X-ray data.

As part of the project, our researchers developed expertise in multi-modal AI and has since been recruited by our project partner, AstraZeneca, into their machine learning team highlighting the strength and impact of the training environment.

We are currently preparing research publications focused on our inverse-IMPRESSION approach, which will contribute to the broader development of the X-GAMES framework. Alongside this, we have expanded our technical approach beyond graph based methods to include emerging architectures such as large language models and diffusion models, ensuring flexibility and future scalability.

Next Steps

Looking ahead, we are actively pursuing additional funding opportunities, including from EPSRC and ERC, to further advance this work. We are also building collaborations with leading international researchers to establish shared datasets and benchmarking standards. While development continues, we remain committed to progressing key elements of the X-GAMES framework through ongoing research activities within our group.

R1 2025
R2 2025

AI Agents for Chemistry: Designing the LLM-CDS Prototype

Meng Fang – University of Liverpool

Purpose, Aims, and Scope of the Visit:

This three-month visit to University College London, hosted by Prof. Jun Wang, will focus on designing and prototyping a Large Language Model-based Chemical Data Scientist (LLM-CDS). Building on the Agent K framework, the project aims to create an autonomous agent capable of structured reasoning across diverse chemical data and literature to support reaction prediction, property inference, and hypothesis generation. The visit will deliver a functional prototype, establish benchmarking protocols, and develop collaborative outputs including a technical paper and funding proposal. This collaboration combines expertise in AI agents and chemistry, advancing trustworthy, data-driven automation in chemical discovery.

Autonomous Optimisation of Sustainable Ligand Synthesis for Materials Discovery

George Lyall-Brookes – University of Liverpool

Purpose, Aims, and Scope of the Visit:

George’s proposed visit to the University of Cambridge, hosted by Prof. Alexei Lapkin, will focus on developing a sustainable and transferable route to polyaromatic ligand precursors using self-optimising algorithms in flow chemistry. These ligands are essential for constructing advanced materials, yet their synthesis remains a bottleneck. By applying multi-objective optimisation and autonomous closed-loop systems, George aims to create high-yielding, cost-effective, and environmentally friendly procedures. The visit will combine hands-on lab work with algorithmic development, increasing his expertise in flow chemistry and AI-driven synthesis while contributing to scalable solutions for high-throughput materials screening.

Generative AI-Guided Discovery of Polar Materials for Ferroelectric Applications

Andrij Vasylenko – University of Liverpool

Purpose, Aims, and Scope of the Visit:

This collaborative visit aims to establish a computational–experimental partnership between Andrij and Empa to accelerate the discovery of new non-centrosymmetric polar materials, particularly ferroelectrics. The project will apply PIGEN – a new in-house physics-informed generative AI model for inorganic crystal structure design to predict and prioritise promising candidates for experimental synthesis. Two short visits will align research goals, validate AI-generated structures, and refine modelling strategies toward publication and a Horizon Europe proposal. Combining AI-based prediction with Empa’s synthesis expertise, the collaboration will demonstrate the power of generative models in guiding experimental discovery of functional materials.

Integrating Machine Learning Potentials with Neutron Scattering for Advanced Materials Simulation at the European Spallation Source

Harry Richardson – University of Bristol

Purpose, Aims, and Scope of the Visits:

This visit supports the development of a machine learning-enhanced workflow for analysing neutron scattering data at the European Spallation Source (ESS). The project focuses on integrate machine learning interatomic potentials (MLIPs) into neutron scattering data workflows, enhancing automation and analytical precision. Through collaboration with the ESS Data Management and Software Centre, Harry will develop methods linking MLIP simulations to experimental scattering pipelines. It will also explore opportunities for ESS “first science” experiments validating MLIPs using Quasi Elastic Neutron Scattering (QENS) on liquid systems and refine fine-tuning strategies for disordered materials. This collaboration will strengthen the synergy between machine learning and neutron scattering, ensuring data-driven, physics-informed analysis is embedded in future ESS research infrastructure.

Understanding of mechanistic principles in gold(I)-catalysed synthesis using a top-down machine learning approach

Risnita Listyarini – University of Strathclyde

This two-month visit to the University of British Columbia, Risnita will collaborate with Dr. Jolene Reid to apply data-driven machine learning approaches to study asymmetric gold(I)-catalysed reactions. The project aims to uncover mechanistic principles governing enantioselectivity using top-down modelling techniques such as clusterwise linear regression, which identify key molecular interactions with minimal human bias. By developing predictive statistical models, the collaboration seeks to extend mechanistic understanding across diverse reaction types, improving catalyst design and reaction optimisation. This work will enhance Risnita’s expertise in data-driven chemical modelling while advancing the broader goal of applying AI to complex organic reaction mechanisms.

Purpose, Aims, and Scope of the Visit:

Agentic Reasoning Models for AI-Driven Chemical Discovery

Nicholas Runcie – University of Oxford

Purpose, Aims, and Scope of the Visit:

This six-week visit to EPFL, hosted by Prof. Philippe Schwaller, will establish a new collaboration between the Oxford Protein Informatics Group and the Laboratory of Artificial Chemical Intelligence. The project will focus on advancing large language models for chemistry by combining recent progress in reasoning models with autonomous agentic systems. The overall aim is to develop more capable and practically useful AI tools that can assist chemists with core research tasks such as hypothesis generation, experiment planning, and interpretation of complex experimental data.

AI-Driven Automation Frameworks for Cross-Domain Materials Discovery

Aritra Roy – London South Bank University

Purpose, Aims, and Scope of the Visit:

This one-month visit to EPFL, hosted by Prof. Berend Smit, will focus on learning from leading work in AI-driven automation for materials discovery and establishing the basis for future collaboration across chemistry and artificial intelligence. The project aims to understand how robust, production-quality automation frameworks are developed for community use, particularly in integrating data extraction, validation, density functional theory calculations, molecular simulations, and machine learning into reliable discovery pipelines.

AI-Driven Multi-Fidelity Optimisation of Mesoporous Silica Synthesis

Mengjia Zhu – University of Liverpool

Purpose, Aims, and Scope of the Visit:

This visit to the University of Washington, with collaborative links to the University of Cambridge, will focus on developing AI-driven multi-fidelity optimisation strategies for mesoporous silica synthesis. Working with Prof. Lilo Pozzo and Prof. Shijing Sun, the project aims to define a practical optimisation problem grounded in real experimental workflows, using SAXS and USAXS signatures, morphology descriptors, and hierarchies of data quality ranging from rapid laboratory measurements to high-precision synchrotron data. The visit will also explore how Bayesian optimisation and active-learning methods can incorporate expert feedback and chemical intuition to guide experiment selection more effectively.

Hierarchical Generative Modelling of Metal-organic Frameworks

Gaopeng Ren – Imperial College London

Purpose, Aims, and Scope of the Visit:

This visit to the University of Toronto, hosted by Mohammed Moosavi, will focus on advancing machine learning methods for the design of metal–organic frameworks through hierarchical generative modelling. The project aims to establish a coarse-to-fine framework that captures the multiscale nature of MOFs, from local coordination environments and connectivity patterns to extended network structure and pore-level properties. By applying modern generative models to coarse-grained structural representations and refining these towards atomistic detail, the project will seek to enable more efficient and chemically realistic exploration of MOF design space.

Language-based Agentic Workflow for Porous Materials Discovery

Xenofon Evangelopoulos – University of Liverpool

Purpose, Aims, and Scope of the Visits:

This visit to Stanford University, hosted by Prof. Yejin Choi, will strengthen an existing collaboration at the interface of large language models and materials discovery. Building on recent joint work on LLMs for metal–organic frameworks, the project will focus on developing language-based agentic systems that can move beyond materials description and recommendation towards hypothesis generation for porous materials discovery. The aim is to investigate how LLMs can generate, refine, and critique scientific hypotheses while also responding to practical laboratory constraints and feedback.

Leveraging Ontological Knowledge with Argumentative Agentic AI to Accelerate Chemical Development

Prof. Alexei Lapkin – University of Cambridge
Prof. Francesca Toni – Imperial
Dr. Antonio Rago – Kings College London

Project Summary

In this project we aim to demonstrate how relational databases can enhance AI workflows, focusing on an advanced argumentative agentic AI approach and a well-known and highly challenging problem of predictive scalability in chemical process development. Specifically, the project will aim to develop a human-in-the-loop AI framework for guiding multi-step process scale-up in the context of small molecule active pharmaceutical ingredients (APIs) manufacture.

Watch the Interview

Project Team

Prof. Alexei Lapkin is a Professor of Sustainable Reaction Engineering at the University of Cambridge. His group pioneered the use of Bayesian Optimisation in process engineering and developed tools that became widely adopted in academia/industry (Summit, ORDerly, reaction balancing, etc). His work on AI methods in chemical process development led to the establishment of four start-up companies (ReactWise, Accelerated Materials, GreenCAT, Chemical Data Intelligence).

Prof. Francesca Toni is Professor in Computational Logic and Royal Academy of Engineering/JP Morgan Research Chair on Argumentation-based Interactive Explainable AI at Imperial College London, UK. She leads Computational Logic and Argumentation research group and of the XAI research centre. Her research interests lie within the broad area of Knowledge Representation and Reasoning in AI and Explainable AI. Among notable past projects, she has coordinated two EU projects, was Technical Director of the ROAD2H EPSRC-funded project and co-Director for the Centres of Doctoral Training in Safe and Trusted AI and in AI for Healthcare. She is in the Board of Directors for KR Inc. and IJCAI trustee.

Dr Antonio Rago is a Lecturer in Computer Science at King’s College London. He has been one of the pioneers of argumentative XAI, delivering numerous high-profile talks on the subject, including one on IJCAI’s invited Early Career Researcher track. He has been instrumental in finding novel application areas of argumentation and XAI, including mechanical engineering, online review aggregation and judgmental forecasting. He has also pioneered, in a collaboration with Mercedes-AMG Petronas F1 Team, the use of explainable RL for F1 race strategy.

Potential for Impact

The project will impact academic community working in AI for chemistry through developing the standardised process chemistry ontology and a suite of agents that operate with relational chemistry databases. This will create a foundational layer for the future deployment of advanced AI tools in process chemistry and can be extended to discovery chemistry and materials. The developed human-in-the-loop framework of multi-agent argumentative AI is an advanced AI concept and will serve as a critical demonstration of advances in AI for chemistry challenges, whereas the developed codes could be used for rapid implementation within industrial setting.

Take a look at the PDRA Opportunities here

Alignment of Generative AI for Materials Discovery via Experimental Feedback

Dr. Shijing Sun – University of Cambridge
Prof. Aron Walsh – Imperial

Project Summary

Our project addresses the challenge of bridging the simulation-to-real gaps in inorganic materials discovery directly. We propose an integrated, closed-loop platform that connects generative AI for crystals with high-throughput robotic synthesis and characterisation. We will deliver a processing-conditioned, disorder-aware generative model that is calibrated by small-batch experimental feedback. Our objective is to align generative models using real experimental outcomes for AI-driven discovery in which hypothesis generation, synthesis, and evaluation are all automated and connected.

Watch the Interview

Project Team

Dr Shijing Sun is an experimental materials scientist specialising in high-throughput synthesis, characterisation, and accelerated device testing for energy materials across academia and industry. She specialises in automated processing of solution-processed semiconductors (halide perovskites and perovskite-inspired compounds), including recent work that applies computer vision and reinforcement learning for microstructure analysis and synthesis planning of defect-minimised Ag–Bi–I thin films, and a Bayesian-optimisation-driven robot for lead halide perovskite synthesis under humid conditions. Formerly Assistant Professor of Mechanical Engineering at the University of Washington, she developed low-cost, modular “Lego-style” self-driving laboratory platforms for solution-phase synthesis. In September 2025 she joined the Department of Materials Science and Metallurgy at the University of Cambridge, where she is establishing the Autonomous Labs for Energy Materials focused on functional inorganic and hybrid framework materials.

Prof. Aron Walsh is a computational materials chemist whose research integrates quantum-mechanical simulation, machine-learned force fields, and generative models to design functional materials for renewable energy and optoelectronics. He is Professor at Imperial College London and leads the Materials Design Group, which develops open, reproducible software and data workflows used internationally. He has led large, multi-partner programmes and supervises an interdisciplinary team spanning chemistry, physics, and computer science; alumni now hold positions in academia and industry. Within this project he will direct the computational work package, linking ab initio calculations with ML force fields and generative models, and establishing FAIR practices to couple predictions with experimental validation and iteration.

Potential for Impact

The outcomes will advance both the AI–chemistry interface and the materials science community:

Scientific Impact: Assess whether generative AI can predict stable materials phases that are experimentally realisable. Success will demonstrate a viable route to bypass the limitations of traditional high-throughput computational screening, which is confined to exploration of the known chemical space.

Community Impact: The UK is competitive globally in laboratory automation, exemplified by autonomous synthesis platforms at the University of Liverpool, University of Glasgow, etc. Building on this national strength, our proposal brings a complementary synthesis capability for solution-processed thin-film semiconductors, led by new PI Sun at the University of Cambridge. All generated structures, experimental characterisation data, and model checkpoints will be shared openly via the Materials Cloud and GitHub, following FAIR data principles. This ensures reproducibility and supports downstream adoption by other research groups and industrial partners.

Beyond the Project: The proposed AI–experiment loop can be extended beyond halides to many other chemical systems, helping to explore parts of materials space that traditional screening overlooks. By making generative design disorder- and processing-aware, the approach can uncover realistic new candidates across fields such as energy storage, catalysis, and quantum technologies. This foundation supports future EPSRC Centre or Programme Grant proposals on closed-loop AI materials discovery.

Take a look at the PDRA Opportunities here

Human-AI Teaming for Chemistry (HATCH)

Dr. Jihong Zhu – University of York
Dr. Gabriella Pizzuto – University of Liverpool
Professor Robert Gaizauskas – University of Sheffield
Professor Ian Fairlamb – University of York

Project Summary

A step-change in synthetic chemistry can only be realised through an intelligent and physical synergy between human chemists, AI coordinators, robotic platforms and chemistry equipment. This project aims to provide and validate such a framework, aiming to bridge the gap between today’s highly-equipped, specialist laboratories and the wider chemistry community, creating an accessible, truly collaborative laboratory of the future.

Watch the Interview

Project Team

Dr. Jihong Zhu is an expert in robot learning. He established the Robot-assisted Living Lab (RALLA) at York, equipped with bimanual manipulators and advanced sensors, providing the core infrastructure and technical leadership for this project. His pioneering work on human-centred bimanual manipulation demonstrates proven success in creating complex, human-AI collaborative systems.

Dr. Gabriella Pizzuto is a RAEng Research Fellow specialising in AI-driven “robotic scientists.” Her work, recognised at flagship robotics conferences (ICRA, CASE) and on robot learning for chemistry applications has attracted additional EPSRC funding support, including a recently awarded New Investigator Award. Her leadership roles within the Henry Royce Institute and AIchemy (ECR committee co-chair) place her at the forefront of this field.

Prof. Robert Gaizauskas is a world-leading expert in Natural Language Processing (NLP). He has worked extensively on text mining from the scientific literature and his recent research focusing on using LLMs for spoken language interaction with robots is directly relevant. His leadership in the Sheffield NLP group and a UKRI CDT ensures AI development will be built on a foundation of cutting-edge research and best practices. He brings not only NLP expertise, but the experience and expertise of colleagues in robust speech recognition in noisy environments, with whom he collaborates closely.

Prof. Ian Fairlamb provides world-class expertise in metal catalysis and automated chemistry. He directs the highly-equipped Automated Robotics for Chemistry Laboratory at York, a hub connecting chemistry with computer science and engineering. His strong industrial links, including a PhD studentship with Labman and equipment investment from Chemspeed, underscore the industrial relevance of his work. As co-director of ALBERT, a doctoral training centre on automated laboratory experiments, he is at the forefront of training the next generation of researchers in this field.

Potential for Impact

This project will transform chemical discovery by combining AI, automation, and robotics into a self-driving laboratory. By dramatically increasing the speed, safety, and efficiency of experimentation, it will accelerate the development of new medicines, sustainable materials, and clean energy solutions, helping tackle global challenges from healthcare to climate change.

Take a look at the PDRA opportunities here

Explore our Funded Projects

Student Internships

Pump Priming Funding Call

Collaborative Travel

Frontier Fund

2024

2025

2026

Accelerating Chemistry Lab Automation Through AI-Driven Robotics

Bayesian Optimization for Chemical Applications

Data Science Approach to Crystal Structuress?

Learning from large-scale CSP – database-informed prediction of spontaneous resolution

Machine Learning Tight Binding for Proton Battery

Machine Learning-Assisted High-Throughput Screening for Organic Semiconductors: A Comprehensive Study and Database Development

Mapping the chemical space of intermetallic compounds

Modelling Phase Transitions: Characterising Henry’s Law

A Computer Vision Approach to Understanding Polymer Swelling Kinetics

A Fast and Justifiable AI approach to characterising hydrogen bonds

Computational Design of New Chiral Metal Halide Semiconductors Using High-Throughput and Machine Learning

Develop an AI data audit tool to help with data ingestion for the Physical Data Science Data collection

Exploring materials space with optimal transport

Human-AI Collaborative Closed-Loop Optimization Online Platform Development

Impact of finer k-point sampling on vibrational free energy and crystal structure ranking accuracy

Machine Learning acceleration of metadynamic simulations of antimicrobial peptides

Machine Learning Enabled Discovery of Point Defects Qubits for Quantum Technologies

Machine Learning Force Fields for CO₂ Adsorption and Reduction in Porous Solids

Multi-Objective Bayesian Optimisation Approach Towards Advancing Automated Liquid-Handling Platforms

NeuralBind: Enhancing chemical coverage and diversity of training data for binding-affinity predictions.

Towards the Development of Asynchronous Solvent Handling Capabilities for Automated Liquid Handling

Transforming Undergraduate Learning. Migration of a Student Computational Drug Discovery Assignment to a Python Based Project.

Benchmarking LLM-driven Scientific Reasoning in Closed-Loop Experiments

Data-driven approaches for solubility prediction in the material sciences

Disentangling Operando Spectroscopy with Deep Learning

Domain-Specific AI for Experimental Data Analysis in Virtual Reality Laboratory Environments

Interpretable discovery of functional organic molecules for nanotechnology applications

Large Language Models for Generating a Universal Calibration Curve for Polymer Diffusion

Large Language Models vs Bayesian Optimisation: Toward Self-Driving Electrolyte Discovery

Machine learning enabled lithium-ion electrolyte chemical composition identification through Electrochemical Impedance Spectroscopy (EIS) data

Machine Learning for Molecular Mechanics

Machine Learning Prediction of Absorption Spectra of Photoswitches for Solar Thermal Fuels

Multi-property peptide optimisation: from design to lab screening

2024

Accelerating Materials Discovery: Integrating Machine Learned Force Fields (MLFF) with Monte Carlo Simulations

AI-Enabled Prediction of Lipid Membrane Composition from Optical Signatures

AI-Enhanced Molecular Dynamics: Integrating Long-Range Interactions with Graph Neural Networks

Bayesian Optimization for Accelerating Metal-Based Antibiotic Discovery

Computer Vision for Predicting the Impact of Additives in Protein Crystallisation

Enabling Data-Driven Discovery and Reaction Optimisation in Porous Organic Cage Synthesis

Exploration of defect superstructure phase diagrams in graphene with Bayesian AI

High-Throughput Data-Driven Electrolyte Design to Enable Lithium Metal Batteries

Transforming Chemistry Labs with Safe and Intuitive Human-in-the-Loop Robotic Systems

X-GAMES: Crystallography with Machine Learning

R1 2025

R2 2025

AI Agents for Chemistry: Designing the LLM-CDS Prototype

Autonomous Optimisation of Sustainable Ligand Synthesis for Materials Discovery

Generative AI-Guided Discovery of Polar Materials for Ferroelectric Applications

Integrating Machine Learning Potentials with Neutron Scattering for Advanced Materials Simulation at the European Spallation Source

Understanding of mechanistic principles in gold(I)-catalysed synthesis using a top-down machine learning approach

Agentic Reasoning Models for AI-Driven Chemical Discovery

AI-Driven Automation Frameworks for Cross-Domain Materials Discovery

AI-Driven Multi-Fidelity Optimisation of Mesoporous Silica Synthesis

Hierarchical Generative Modelling of Metal-organic Frameworks

Language-based Agentic Workflow for Porous Materials Discovery

Leveraging Ontological Knowledge with Argumentative Agentic AI to Accelerate Chemical Development

Alignment of Generative AI for Materials Discovery via Experimental Feedback

Human-AI Teaming for Chemistry (HATCH)