Browse through the projects funded by AIchemy, highlighting cutting-edge research, training, and collaboration at the intersection of AI and chemistry.
This page will be regularly updated with project outcomes, results, and new developments, so please check back often to stay informed about the latest progress from these projects.
Student Internships 2024
Pump Priming Funding Call 2024
Student Internships 2025
Collaborative Travel (R1) 2025
Frontier Fund 2026
Accelerating Chemistry Lab Automation Through AI-Driven Robotics
Alex Wright
Host University: University of Liverpool
Host Academic: Dr. Gabriella Pizzuto
Alex’s project focused on investigating the simulation-to-reality gap in particle behaviour for robotic chemistry, using the ORBIT open-source robotics simulator. Alex focused on developing a method to represent this gap in simulated particles to improve their usability for robotic chemists. The project involved deploying a simulated robotic environment to measure the angle of repose and comparing these results to real-world experiments. Ultimately, the goal was to refine simulation parameters to better align virtual and physical systems.
The project resulted in the development of a low-cost powder flow testing method to analyse discrepancies between simulated and real particle behaviour. Key findings revealed significant challenges in accurately replicating particle collisions, as small particles passed through simulation boundaries, skewing comparisons. Additionally, simulated salt particles differed markedly from their real counterparts in appearance and flow, producing an unrealistically low angle of repose. These findings highlight the need for further refinement of environment parameters or alternative particle models. With more time and data, the workflow could be improved to reduce these discrepancies and enhance the simulator’s realism.
Through this project, Alex strengthened his ability to work within a collaborative coding environment, building upon existing particle physics frameworks and integrating new methods. He gained foundational experience with ROS, learning how simulated environments can be transferred to physical robotic setups. He also improved his communication and teamwork skills, as he frequently collaborated with colleagues to troubleshoot and refine code. Additionally, Alex improved his technical abilities in 3D CAD modelling and scientific presentation, broadening his practical and professional skill set beyond coding alone.
Bayesian Optimization for Chemical Applications
Eltjo Mante
Host University: Imperial College London
Host Academic: Prof. Kim Jelfs
This project aimed to enhance a Bayesian Optimization web application (web-BO) and explore multi-fidelity Bayesian Optimization (MFBO) for chemical problems. The first goal was to improve the usability and functionality of web-BO through bug fixes and new features. The second goal involved investigating MFBO methods in the context of chemical simulations, particularly focusing on the 6D Hartmann function.
Key outcomes included the development of a video tutorial page to replace a looping GIF, enabling users to pause and navigate freely. An explanations page was added to clarify tool functionality and input formats, complete with sidebar navigation and tooltips. Several bugs were resolved, including overlapping input cells and trailing decimals, and graph displays were cleaned up for clarity.
On the MFBO side, the project compared low- and high-fidelity optimization methods, revealing that low-fidelity approaches often reached the global minimum more effectively than high-fidelity ones, which tended to get stuck in local minima. A major challenge was modeling correlation between fidelities, which proved to be highly problem-specific and complex. Future work could focus on integrating MFBO into web-BO, with correlation modeling as a key step.
Eltjo gained hands-on experience in machine learning, full-stack web development (Flask, HTML, CSS, Bootstrap, JavaScript), and version control using Git and GitHub. The collaborative environment also fostered growth in communication and teamwork, balancing independent problem-solving with guidance from experienced peers.
Data Science Approach to Crystal Structuress?
Ziqiu Jiang
Host University: University of Liverpool
Host Academic: Prof. Vitaliy Kurlin
This project aimed to utilize the average-minimal-distance (AMD) metric to assess the similarity between novel predicted crystal structures and experimentally determined structures stored in the Cambridge Structural Database (CSD). This comparison is essential for validating computational predictions of protein structures by providing a quantitative measure of how closely they align with known experimental data.
Ziqui successfully computed the nearest neighbours of predicted crystal structures by comparing them with over 850,000 experimental crystal structures from the Cambridge Structural Database (CSD). This was achieved using the Average Deviation from Asymptotic (ADA) metric, an invariant that accounts for both geometric considerations (through AMD) and the density of the atomic points in crystals. It was observed that most predicted crystal structures with high density tend to closely match experimental structures. However, lower-density predicted crystals often displayed more significant gaps when compared to experimental structures. These cases have the potential to inform further research in crystal structure prediction and validation.
Ziqui gained a deeper understanding of mathematical invariant and their applications in computational chemistry. He also learned key concepts in crystallography, particularly the structural properties of crystals and definition of crystal components, and how these concepts can be integrated into computational models. By applying algorithms to compare predicted crystal structures with experimental data, he developed practical skills in analysing crystal structures and identifying structural similarities, further reinforcing my computational and analytical expertise.
Learning from large-scale CSP – database-informed prediction of spontaneous resolution
Leo Arogundade
Host University: University of Southampton
Host Academic: Prof. Graeme Day
This project aimed to investigate classifier models for predicting chiral resolution by crystallisation of small organic molecules. The models were trained on computational data comparing the relative stabilities of racemic and enantiopure crystal structures. As part of the workflow, Leo used the Cambridge Structural Database API to extract molecular files, applied RDKit to calculate molecular descriptors, and trained logistic regression and support vector machine models to predict crystallisation outcomes. The project contributes to understanding how molecular features influence chiral resolution and explores the potential of machine learning in crystal engineering.
Leo applied several classification models to predict chiral resolution based on energy differences between racemic and enantiopure crystal structures from a dataset of 298 molecules. Initial models (SVM, logistic regression, decision tree) achieved over 90% accuracy but were biased toward predicting racemic crystallisation due to class imbalance. A second round of modelling with balanced data reduced bias but also lowered accuracy, which was later improved through feature selection and de-correlation, resulting in final accuracies of 83% (SVM), 62% (logistic regression), and 73% (decision tree). Toward the end of the project, Leo began planning how to expand the dataset by proposing new molecules for future study in the forerunner CSP project. These outcomes highlight both the potential and limitations of machine learning in predicting crystallisation behaviour.
Through this project, Leo gained practical experience in literature analysis, molecular data extraction using RDKit and MOL files, and using the Cambridge Structural Database API to retrieve crystal information via refcodes. He developed a strong understanding of machine learning algorithms—support vector machines, logistic regression, and decision trees—and learned how to implement them in Python. Additionally, he improved his presentation skills, and learned to set up computing environments on a supercomputer (Iridis5), broadening his technical capabilities in high-performance computing.
Machine Learning Tight Binding for Proton Battery
Xuheng Zhao
Host University: Imperial College London
Host Academic: Dr. Jarvist Frost
Xuheng’s project aimed to optimise Density Functional Tight Binding (DFTB) parameters from a standard database specifically for targeted materials, using machine learning techniques with the assistance of the Tight Binding Machine Learning Toolkit (TBMaLT). These refined parameters are intended for use in molecular dynamics simulations to predict material properties, contributing to the development of a proton battery. The project focused on building and validating a workflow using small molecules to demonstrate its feasibility and accuracy. Ultimately, the goal is to establish a robust, transferable simulation pipeline for future energy-related materials research.
A machine learning workflow using TBMaLT was successfully built to optimise DFTB parameters, involving dataset preparation, input formatting, model training with PyTorch, and performance evaluation against DFT ground truth. However, due to limitations in TBMaLT’s current development stage, the predicted properties lacked sufficient accuracy, and final parameter extraction for molecular dynamics was not completed. In parallel, Zuheng developed a non-ML optimisation workflow using the Nelder-Mead algorithm, which directly refines DFTB parameters via DFTB+ calculations and MSE loss minimisation. While promising, this workflow requires further refinement to ensure physically meaningful Hamiltonian matrices. Together, these efforts lay the groundwork for a robust simulation pipeline to support proton battery material design.
Through this project, Xuheng learned the fundamentals of tight-binding methods and how they integrate with AI in scientific research, enhancing my understanding of AI’s role in advancing science. Additionally, she gained knowledge in compiling and using software such as Gaussian, FHI-aims, and DFTB+, understanding their suitability and limitations for various purposes. She also acquired a basic understanding of the development of science-purpose packages, like TBMaLT.
Machine Learning-Assisted High-Throughput Screening for Organic Semiconductors: A Comprehensive Study and Database Development
Malin Zollner
Host University: University of Strathclyde
Host Academic: Dr. Tahereh Nematiaram, Dr. Yashar Moshfeghi
Malin’s project aimed to explore the research landscape of organic semiconductors, focusing on their applications in sensors, photovoltaics, and light-emitting diodes. A key objective was to establish a literature-based database containing experimental data such as power conversion efficiency and structural/electronic fingerprints of donor and acceptor materials. Malin also planned to compute structural fingerprints and replicate machine learning methodologies from previous studies to trial high-throughput screening techniques. The ultimate goal was to support materials selection and optimisation using machine learning for organic photovoltaic applications.
She conducted a thorough literature review on organic semiconductors, identifying key descriptors, fingerprints, and machine learning algorithms used in sensor, photovoltaic, and LED applications. She developed code to run machine learning models and generate a database of molecular fingerprints and descriptors, which is now available for future research. A detailed database of organic donor and acceptor molecules was compiled, enhanced with Morgan, Daylight, and MACCS fingerprints. This resource supports ongoing efforts in machine learning-assisted materials screening for organic photovoltaics.
Throughout the project, Malin gained a solid understanding of the fundamentals of organic semiconductors and how artificial intelligence is being applied in this field through literature review and research. Regular meetings with her supervisors helped her improve presentation skills and deepen her understanding of the research process. Under expert guidance, she developed and refined her coding skills, which were especially valuable given her limited prior experience. She also learned how to run machine learning algorithms and became familiar with key AI terminology, marking her first steps into computational research. Overall, the project provided a strong foundation for future work in AI-driven materials science.
Mapping the chemical space of intermetallic compounds
Ryan Napo Nduma
Host University: Imperial College London
Host Academic: Prof. Aron Walsh and Anthony Onwuli
The aim of the project was to learn and understand the different chemical properties that distinguish intermetallics from other possible compounds within the chemical space. Then utilize machine learning and relevant computational techniques to contribute to the Walsh Materials Design group’s SMACT software by building a series of filters and rules. These filters and rules would allow SMACT to appropriately screen for these compounds within the chemical space.
Ryan developed two key rules: the Valence Electron Count and the Electronegativity Difference to help distinguish intermetallic compounds, based on insights from literature and guidance from Prof. Walsh and Anthony. He applied these features in a classification exercise using random forest and logistic regression models, exploring how feature weighting affects model performance. His contributions included writing code that was integrated into the latest version of SMACT, enabling new functionality such as Valence Electron Count calculations. Additionally, he helped improve SMACT’s usability by fixing examples and tutorials. These outcomes align with the project’s goal of mapping the chemical space of intermetallics and enhancing tools for materials discovery.
Through this project, Ryan gained a deeper understanding of materials informatics, including its potential to drive innovation in sustainable and high-performance materials across industries. He developed practical skills in machine learning, mastering basic techniques like regression and classification while beginning to explore advanced concepts such as regularization and dimensionality reduction. Through regular meetings and workshops, he improved his presentation and communication skills, learning to express ideas clearly in a research setting. Ryan also learned how to initiate and manage a research project, from defining objectives to developing impactful solutions. Importantly, he took his first steps into computational research, gaining confidence in coding and working within a collaborative research environment.
Modelling Phase Transitions: Characterising Henry’s Law
Josh Cheung
Host University: University of Southampton
Host Academic: Dr. Joanna Grundy, Prof. Jeremy Frey
This project main focus was to use machine learning models to predict values for solubility (log S) and Henry’s law constant (kH), exploring the links between the two properties. This was completed via data curation to create a combined dataset, followed by data processing, training and selection of machine learning models, and analysis of predictions against a withheld test set.
Two machine learning models were successfully developed to predict solubility and Henry’s law constant, trained on large datasets and tested on a shared subset of 2,563 datapoints. Both models achieved acceptable accuracy, with around 50% of predictions scoring below 1 for mean squared error (MSE), and the Henry’s law model performed exceptionally well for hydrocarbons. However, due to time constraints, the analysis of feature importance was not completed, leaving room for future investigation using techniques like recursive feature elimination and SelectKBest. A number of data quality and modelling flaws were identified, highlighting opportunities for improvement in data sanitation, feature engineering, and model refinement. The project lays a strong foundation for further development and analysis in predictive modelling of chemical properties.
The project provided Josh with hands-on experience in data curation, model training, and result evaluation. They also gained valuable transferable skills in project organisation, time management, and scientific reporting. Importantly, this experience deepened their understanding of machine learning in physical chemistry and helped guide their career aspirations towards computational and data-driven research.
Accelerating Materials Discovery: Integrating Machine Learned Force Fields (MLFF) with Monte Carlo Simulations
Jay Zhou – University of Bath
Steve Parker – University of Bath
Tom Underwood – STFC, RAL
Project Summary
Recently, the use of Machine Learned Force Fields (MLFFs) for Molecular Dynamics (MD) simulations have popularized. However, this is not the case for Monte Carlo (MC), which performs exceptionally well for systems such as gases, complex mixtures and adsorptions. In fact, the machinery for MLFF integrated MC is currently absent on open-sourced platforms. Therefore, the goal of this project is to develop an open-sourced software framework that integrates MLFFs with MC simulations to enhance the calculation of various thermodynamic properties of solid-state materials. By leveraging MLFFs trained on ab-initio data, we aim to improve the accuracy of MC simulations in predicting precise free energy properties, while offering significantly faster compute speed than ab-initio methods.
This approach addresses the limitations of classical force fields (CFFs), which are traditionally used to describe the interactions between atomic and molecular species in MC simulations but often fall short in accuracy. With this project, we commit to program a reliable interface between the Monte Carlo simulation software package DL_MONTE (a member of the Daresbury Lab software suite) and MLFFs in the form of universal Python functions (callable by ASE). We will also engage in extensive testing of the said interface, by using the CHGNet Neural Network Force Field to calculate thermodynamic properties of water via Grand Canonical Monte Carlo (GCMC). The final code package will be released open-sourced alongside detailed documentation and tutorials to attract community engagement and fast track the adoption of AI in the MC simulation community.
Project Outcomes
The project delivered a functional interface linking machine-learned force fields with Monte Carlo simulations via the DL_MONTE platform. This enables ab initio-level accuracy for simulations of gases and materials at significantly reduced computational cost. The developed server-client framework is flexible, scalable, and compatible with multiple MLFF tools, with open-source code and documentation provided. This represents a key step towards broader adoption of ML-driven simulation techniques in materials chemistry.
Next Steps
The team plans to publish the methodology and improve computational efficiency by securing additional GPU resources. Future work will focus on overcoming current performance limitations to enable more accurate and scalable simulations.
AI-Enabled Prediction of Lipid Membrane Composition from Optical Signatures
Dr. Miguel Paez Perez – Imperial College London
Prof. Marina Kuimova – Imperial College London
Project Summary
Lipid membranes play a key role in biology; they provide structural integrity, are central in intercellular communication, control content exchange, and transduce extracellular signals. Their functionality arises from their unique structure and biophysical properties, which are dictated by the interaction between the membrane’s constituent lipid molecules. Deregulation of these lipid-lipid interactions has been linked to diseases including cancer, malaria, Alzheimer, or atherosclerosis. From a commercial perspective, membrane composition and lipid-lipid interactions influence the efficacy of lipid-based drug carriers, such as miRNA vaccines. Yet, there is a limited understanding on how the lipid composition of complex, multi-component membranes, affects their biophysical behaviour.
To address this challenge, we are developing tools capable of monitoring the biophysical features of lipid bilayers with high throughput and low-cost. We will leverage on an in-house high-throughput vesicle production device, optical readouts, and AI tools to unlock an otherwise inaccessible insight into how the chemistry of complex lipid membranes dictates its biophysical properties.
This project will generate a publicly available, curated dataset to facilitate the development of data-driven biophysical models, and we anticipate its outputs will find applications in areas including antimicrobial resistance research or artificial cell development, supporting key UK strategic areas like Engineering biology.
Project Outcomes
This project established a proof-of-concept for “optical lipidomics,” combining fluorescence-based sensing with AI to classify lipid membranes. By using multiple environmentally-sensitive dyes, the team generated unique fluorescence fingerprints for biological samples, including cancer-derived vesicles. A novel vesicle production technology (“Spinlex”) was also developed to accelerate data generation. The project led to early-stage funding success and supported significant researcher upskilling in AI methods, alongside progress towards patenting the technology.
Next Steps
The team is expanding collaborations to apply this approach to antimicrobial resistance and cancer diagnostics. Future plans include pursuing funding from BBSRC, CRUK, and UKRI, as well as supporting fellowship applications to further develop and translate the technology.
AI-Enhanced Molecular Dynamics: Integrating Long-Range Interactions with Graph Neural Networks
Dr. Devis Di Tommaso – Queen Mary University of London
Assoc. Prof. Rachel Crespo-Otero – University College London
Prof. Greg Slabaugh – Queen Mary University of London
Project Summary
Molecular Dynamics (MD) is an essential computational tool for studying atomistic-level phenomena. However, methods like ab initio MD, which rely on density functional theory (DFT) to compute energies and forces, are computationally expensive. On the other hand, methods based on classical interatomic potentials (IP) offer speed but lack flexibility. In recent years, machine learning (ML) approaches promise DFT-quality simulations at faster speeds but often neglect long-range interactions, which are crucial for accurately describing systems such as liquids, gas-solid, liquid-solid, and biomolecular systems. This project aims to develop AI-enhanced MD methods that integrate long-range interactions for predicting both energies and forces. The outcome will be the first version of a code implementing MGT to partially or fully replace the costly and time-intensive traditional computational methods used at each timestep of MD simulations. The model will compute atomic forces, energies, and changes in atomic positions throughout the simulation, enabling a more efficient and scalable approach to studying molecular systems. This will facilitate accelerated atomistic MD simulations that account for both local and long-range interactions.
Project Outcomes
The project developed a novel equivariant graph neural network architecture (BAEE) that integrates long-range electrostatic interactions into molecular dynamics simulations while maintaining computational efficiency. This represents a significant methodological advancement beyond the original Molecular Graph Transformer concept. A manuscript detailing the framework is in preparation, with planned submission in early 2026. The work also strengthened interdisciplinary collaboration and upskilling across AI and computational chemistry, with foundations laid for future open-source release.
Next Steps
The team will further develop and benchmark the method across diverse datasets, followed by open-source dissemination. Plans include submitting an EPSRC proposal and advancing the GRIP framework for inverse materials design in collaboration with industry partners such as IBM.
Bayesian Optimization for Accelerating Metal-Based Antibiotic Discovery
Dr. Angelo Frei – University of York
Dr. David Husbands – University of York
Dr Athi Welsh – University of York
Project Summary
Antimicrobial resistance to current treatments poses a growing threat to global healthcare. At the same time the antibiotic development pipeline remains perilously stagnant. This project aims to accelerate the discovery of novel metal-based antibiotics by integrating machine learning (ML) with high-throughput chemical synthesis and biological evaluation. Collaborating with Atinary Technologies, we will leverage Bayesian Optimization to train ML models with our chemical libraries to predict and iteratively refine iridium(III) metalloantibiotics, maximizing antibacterial potency while minimizing toxicity.
Metal-based antibiotics offer unique structural and functional advantages over organic compounds, yet their discovery remains slow, partially due to the vast chemical space available. Traditional methods of structure-activity relationship elucidation are time-intensive and inefficient. By training a ML model on 1440 iridium(III) complexes, we will virtually screen ~400 million potential compounds from combinations of building blocks, dramatically enhancing the speed of hit identification and the hit-rate. From this virtual screen, two iridium(III) libraries will be synthesized and evaluated using an automated Opentrons system.
Overall, this project is anticipated to yield the following outcomes:
- Identification of 10 lead iridium(III) complexes with high antibacterial activity and low cytotoxicity for further biological evaluation.
- Development of an ML-driven approach to optimize metal-based drug discovery.
- Strengthening of industry-academic collaboration with Atinary Technologies.
- Publication of research findings to advance AI-driven drug discovery. This project leverages the use of AI for chemical innovation, setting the stage for future applications in metallodrug development for other ailments.
Project Outcomes
This project successfully applied AI-driven Bayesian optimisation to explore a vast chemical space of iridium(III) complexes for antibiotic discovery. In collaboration with Atinary, the team developed a novel predictive model trained on bioactivity data, enabling efficient screening across 400 million potential compounds. Two iterative design cycles led to the synthesis and evaluation of 360 new complexes, identifying several promising candidates with strong antibacterial activity and acceptable toxicity. The project also provided valuable upskilling opportunities through direct engagement with an AI industry partner. A research publication is currently in preparation.
Next Steps
Future work will focus on expanding datasets to enable predictions across a wider range of metals and improving model robustness. The team is exploring further funding opportunities, including ERC grants, to continue collaboration with Atinary and advance this approach.
Computer Vision for Predicting the Impact of Additives in Protein Crystallisation
Prof. Bao Nguyen – University of Leeds
Dr. Briony Yorke – University of Leeds
CaiYun Ma – University of Leeds
Dr. Halina Mikolajek – Diamond Light Source
Project Summary
The rise of new drug discovery modalities has underscored the need for efficient macromolecular crystallization, both as a key characterization technique and as a greener alternative to traditional purification methods in manufacturing. However, the weak intermolecular interactions in protein crystals often lead to instability, necessitating the use of additives that influence protein binding, ionic strength, or nucleation. While standardized crystallization screens are widely used, the underlying intermolecular interactions and nucleation/growth mechanisms remain poorly understood, with only limited systematic studies on a small subset of proteins.
This project aims to address these challenges using AI-driven computer vision to classify and extract morphological data from microscopic images of protein crystallization screens, sourced from the VMXi beamline at Diamond Light Source. The outcome will be an automated workflow for crystal characterization across diverse sources, a robust dataset of microscopic images, including negative results and AI models trained to predict optimal crystallization conditions and additives. By improving crystallization success rates, this approach advances both protein characterization and scalable purification. Furthermore, the same AI-assisted image processing techniques can be extended to other materials, broadening their impact beyond biomolecular systems.
Project Outcomes
This project focuses on improving protein crystallisation analysis using artificial intelligence, through two main stages: data curation and model development. In the first stage, we curated and annotated over 3,000 microscopy images from protein crystallisation experiments at Diamond Light Source. Each image was labelled with detailed experimental metadata and crystal characteristics, creating one of the most comprehensive datasets of its kind. This dataset is available to the community, and will be open-access when the manuscript is published.
In the second stage, the dataset was used to train and evaluate AI models for crystal detection. Initial results are promising, achieving 83% classification accuracy. More advanced approaches using large-scale transfer learning were explored but require further computational resources to complete.
Next Steps
Looking ahead, we are seeking additional HPC resources to continue model development and improve performance. Collaborations with AI experts and partners are in place to support the next phase, which will focus on developing more advanced models to predict crystallisation success based on protein structure and experimental conditions.
Further work will also involve engagement with industry partners to expand the project and accelerate real-world application. Additional publications and continued development are planned.
Enabling Data-Driven Discovery and Reaction Optimisation in Porous Organic Cage Synthesis
Dr. Benjamin D Egleston – Imperial College London
Dr. Rebecca Greenaway – Imperial College London
Project Summary
Porous Organic Cages (POCs) are a class of molecular materials with tunable micropore structures that offer significant potential in separation technologies. Recently, our lab has been implementing machine learning tools to assess the accessibility of POCs by encoding chemists’ intuition (doi.org/10.1021/acs.jcim.1c00375). However, traditional methods for synthesising and analysing these materials reaction mixtures are limited due to the complexity of self-assembly and the unintuitive nature of species formed. To address this challenge, the project will integrate robotic liquid handling and parallel synthesis with automated data processing and analysis to enable generation of large experimental datasets – providing structural information and thermodynamic data for these complex systems for machine learning or data-driven applications.
Building on recent progress in automated high-throughput screening for combinatorial synthesis of metal-organic cages (doi.org/10.26434/chemrxiv-2024-hl427-v4) and POCs (doi.org/10.1039/D3SC06133G), the project extends these methodologies to even more complex systems. The project will be centred around identifying unintuitive structures and intermediates in reaction mixtures using generated large libraries of potential molecules for identifying in mass spectrometry (MS) data. This will be combined with automated kinetic sampling and analysis of MS data in parallel reactions to enable mapping of entire reaction spaces.
One key goal of this project is to demonstrate that automation of the discovery process, from reaction preparation to data interpretation, can accelerate the identification of novel materials. Generation of much greater volumes of detailed data will allowing for a deeper understanding of these complex systems. The resulting data-driven foundation will accelerate discovery of novel POCs and other structures that are challenging to predict using traditional intuition.
Project Outcomes
This project advanced high-throughput experimentation for porous organic cage discovery by developing automated LC-MS workflows and data analysis tools. Two Python-based tools enabled efficient analysis of complex reaction mixtures and are already being adopted within the research group. The work facilitated large-scale screening of ~1,000 reaction combinations, leading to the identification of novel cage structures and supporting upcoming publications. The project also introduced automated sampling methods to generate time-resolved datasets, significantly enhancing data-driven discovery capabilities.
Next Steps
Future efforts will focus on integrating these workflows into self-optimising experimental platforms, including Bayesian optimisation approaches. Promising cage candidates will be scaled up and further characterised, alongside continued development of predictive models and collaborative research activities.
Exploration of defect superstructure phase diagrams in graphene with Bayesian AI
Dr. Lukas Hoermann – University of Warwick
Prof. Reinhard J. Maurer – University of Warwick
Dr. David Andrew Duncan – University of Nottingham
Dr. Alexander Saywell – University of Nottingham
Dr. Christopher Allen – University of Oxford
Project Summary
The atom-scale design of two-dimensional materials, particularly defective graphene, shows great promise for catalysis, sensing, and energy storage. By integrating experimental growth and analysis with Bayesian-AI-enabled configuration space prediction via the SAMPLE code, we will lay the groundwork for the experimental design to efficiently explore the phase diagram of defective graphene. This project will uncover how experimental parameters—temperature and gas flux—influence the formation of defect superstructures in graphene that govern its electronic and mechanical properties.
Using SAMPLE, we will generate a comprehensive phase space of hundreds of millions of defect superstructures and efficiently predict their formation energies. This dataset will be available on the NOMAD database. We will calibrate the theoretical phase diagram using TEM and AFM images of N-defects in graphene from our collaborators David Duncan, Alexander Saywell (University of Nottingham), and Christopher Allen (Diamond Light Source, University of Oxford). By mapping the experimental structures with a tessellation code (Duncan and Saywell) and computing their formation energies with SAMPLE, we will place these structures within the phase diagram. Using Bayesian-AI, we will learn the functional dependence of the N-concentration and defect composition on the deposition temperature and gas flux during sample preparation. This will enable the prediction of defect patterns at a given deposition temperature and guide future experiments to achieve graphene layers with targeted defect superstructures. The developed approach will be broadly applicable to any defective two-dimensional material or surface, offering a versatile framework for precision surface engineering in a range of applications.
Project Outcomes
We generated comprehensive datasets of defective graphene superstructures, including graphitic N- and B-defects as well as tripyridinic N-defects. While our focus was on graphene with N heteroatom defects, the methodology was designed to serve as a versatile framework for engineering defects in any 2D material or surface, opening new avenues for precision surface design. By leveraging Bayesian Optimisation methods, trained on DFT formation energies, we successfully predicted the formation energies of approximately 160 million structures across the aforementioned systems.
The dataset is publicly accessible on the NOMAD database.
The work initiated by this project has continued beyond its conclusion, and the results have now been published on arXiv.
Next Steps
Inverse design of growth – Based on the SAMPLE approach, we aim to develop user friendly approach to predict experimental conditions, such as temperature, pressure, and chemical environment, needed to grow graphene with specific defect properties. This inverse design approach has the potential to replace trial-and-error with computational guidance of growth experiments.
Learning the Hamiltonian of N-doped graphene – We have already started to develop a data-driven approach to learn the electronic Hamiltonian for N-doped graphene, using DFT and the MACE-H neural network potential. We aim to model how defects influence the material’s electronic and spectroscopic properties.
Learning forces without forces – We are testing whether the MACE-MP0 foundation model can be retrained to predict accurate atomic forces by using only our SAMPLE formation energy database.
High-Throughput Data-Driven Electrolyte Design to Enable Lithium Metal Batteries
Dr. Neubi Xavier – University of Surrey
Dr. Matthias Golomb – University of Surrey
Project Summary
Rechargeable batteries are a major part of our everyday lives and improving them further is crucial for future technology. The gold standard for the high-performance next-generation batteries is the use of lithium metal as the anode material. One of the major bottlenecks to enabling lithium metal batteries is the increased reactivity between current electrolyte formulations and lithium, leading to uncontrollable side reactions during operation and ultimately causing battery failures.
Researchers are currently focusing efforts on engineering new electrolyte formulations, leading to hundreds of scientific papers being published each week. The amount of data generated makes it impossible for a single researcher to follow all available literature and hinders the rational design of new electrolytes.
In this project, Dr. Neubi Xavier and Dr. Matthias Golomb aim to collate this vast amount of data into an accessible database that will establish clear reporting standards and serve battery scientists, computational chemists, and AI researchers as a starting point for further experimental and computational investigations. Using large language models, they aim to extract property information on lithium metal electrolytes from a wide range of available scientific literature and identify common core descriptors for high-performing candidates. In addition, they will combine high-throughput atomistic simulations and machine learning to fill gaps in the resulting database, aiming to create the most complete and standardized picture of the lithium metal-compatible electrolyte research landscape to date.
Project Outcomes
We have developed and released two open-source software tools to support data-driven materials research. The first enables automated extraction of scientific data from literature using large language models, making it broadly applicable across research fields.
Click here to access the software tool.
The second streamlines the setup of electrolyte simulations, significantly reducing time and potential errors by simplifying input requirements. https://github.com/neubifx/Battflow
Click here to access the software tool.
Alongside these tools, we are creating a standardised dataset that combines experimental data with computational results generated through our automated workflows. A dedicated web interface is also in development to provide access to this resource.
Next Steps
We have approached and established collaboration with industry experimental partners to verify the predictions of the database analysis. Future visits to their premises has been arranged to discuss codes and machine learning predictions of novel electrolytes generated using the data from this project, as well as to provide training for engineering staff. Furthermore, we are in the process of deploying a web interface for the standardization of low-level computational electrolyte analysis, making it accessible to experimental researchers in order to support the standardized reporting of computational results in electrochemistry.
Transforming Chemistry Labs with Safe and Intuitive Human-in-the-Loop Robotic Systems
Dr. Luis Figueredo – University of Nottingham
Dr. Ayse Kucukyilmaz – University of Nottingham
Dr. Gabriella Pizzuto – University of Liverpool
Project Summary
This project aims to transform chemistry labs through robotics and AI—overcoming adoption barriers while enhancing safety and efficiency. We’ll develop a framework that empowers chemists to intuitively teach robots experimental tasks via multimodal demonstrations, eliminating the need for programming expertise while ensuring stringent safety for seamless human-in-the-loop (HIL) operation. Our approach leverages generative AI for semantic scene understanding, grounded in model-based representations to enhance explainability and safety. This enables robots to interpret dynamic lab environments and manipulate glassware and hazardous substances. A certified safety layer ensures compliance with strict standards, advancing HIL automation in chemistry and aligning with AIchemy’s mission to foster intuitive, high-trust robotics in scientific research.
The automation of chemistry labs remains challenging due, among other reasons, to the requirements for precise and safe manipulation of hazardous substances, diverse glassware, and evolving experimental setups. Traditional robotic solutions require extensive programming expertise, limiting accessibility. Our approach leverages multimodal human demonstrations—combining kinaesthetic, visual, and haptic inputs—to develop constraint-based robotic behaviours that chemists can intuitively guide. Certified safety layers ensure secure robotic handling of hazardous liquids, enabling seamless human-robot collaboration in high-stakes lab environments.
Project Outcomes
This project demonstrated that our SpillNot inspired trajectory optimisation approach, originally developed for liquids, can be successfully extended to granular materials, reducing spillage during robotic transport by up to 85%.
Using a robotic arm system, we tested a range of common granular materials with different flow properties. Across all cases, the optimised control method significantly reduced material loss compared to standard motion planning approaches, typically achieving reductions of 50–80%. These results show that effective spill reduction can be achieved even without detailed models of granular dynamics.
In addition to these experimental results, the project delivered a reusable software and experimental pipeline for safe robotic handling of granular materials. The system was tested across multiple research sites, demonstrating strong reproducibility and potential for broader adoption in automated laboratory environments. A prototype virtual environment was also developed to support human-in-the-loop control, including early work on AI-driven safety features.
The project has also contributed to skills development, training researchers in safe robotic automation and advanced control methods. Methodological documentation, code modules, and the granular-transport dataset will form the basis for a first joint conference submission and a video demonstrator showcasing the collaboration and experimental findings.
Next Steps
Looking ahead, the next phase will expand the range of materials and experimental conditions, explore more complex and safety-critical scenarios, and extend the framework towards controlled dispensing of solid materials. Further development will also focus on intelligent, user-guided robotic systems, integrating AI to improve safety, usability, and interaction in laboratory settings.
These efforts will underpin a series of planned publications and future funding proposals, while strengthening collaboration in the development of safe, AI-driven laboratory automation.
X-GAMES: Crystallography with Machine Learning
Prof. Craig Butts – University of Bristol
Dr. Calvin Yiu – University of Bristol
Project Summary
We will build a proof-of-principle generative AI tool – X-GAMES – to identify chemical structures directly from powdered samples by combining NMR spectroscopic and X-ray diffraction data. This is of significant value to pharmaceutical industry, where the chemical structure of molecules, and their packing in crystals controls their drug properties.
Existing generative AI methods are very good at creating a myriad of images or text on a generalised subject and can also be taught to create molecules that fit broad characteristics, e.g. “make me a molecule that might be drug-like”. However, generative structure determination is a much harder challenge – as it requires generating the one-and-only chemical structure that fits uniquely to a particular set of spectroscopic data. At Bristol we have developed early prototype systems capable of doing this albeit only for molecules with a few atoms – based on ‘inverting’ a neural network version of our existing IMPRESSION machine learning architecture that was designed to predict solution state NMR parameters. Essentially this prototype predicts structures from spectra, rather than spectra from structures.
The goal of this X-GAMES project is to build out this prototype so that it works for larger, more complex drug-like molecules. To achieve this, we will train X-GAMES on 10-100x larger datasets which integrate Xray diffraction data as well as the NMR data that our prototype is already designed to use.
Project Outcomes
We have made strong progress in developing innovative tools that combine advanced spectroscopy with artificial intelligence. Our team successfully created prototype methods integrating solid-state NMR data with machine learning predictions, alongside exploring new ways to represent structural information from X-ray data.
As part of the project, our researchers developed expertise in multi-modal AI and has since been recruited by our project partner, AstraZeneca, into their machine learning team highlighting the strength and impact of the training environment.
We are currently preparing research publications focused on our inverse-IMPRESSION approach, which will contribute to the broader development of the X-GAMES framework. Alongside this, we have expanded our technical approach beyond graph based methods to include emerging architectures such as large language models and diffusion models, ensuring flexibility and future scalability.
Next Steps
Looking ahead, we are actively pursuing additional funding opportunities, including from EPSRC and ERC, to further advance this work. We are also building collaborations with leading international researchers to establish shared datasets and benchmarking standards. While development continues, we remain committed to progressing key elements of the X-GAMES framework through ongoing research activities within our group.
A Computer Vision Approach to Understanding Polymer Swelling Kinetics
Alex McKissock
Host University: University of Strathclyde
Host Academic: Dr. Marc Reid
This project focused on studying gallium deformation in acidified salt solutions and analyse the movement via Kineticolor computer vision software.
Alex’s internship project successfully adapted Kineticolor computer vision software to investigate gallium metal locomotion, building on recent research in liquid metal chemistry. He developed expertise in computer vision analysis by studying published examples and producing high-quality experimental videos using prepared salt solutions. His findings revealed distinct stages of gallium deformation and showed that acidity accelerates the process, with Galinstan breaking down in acidified silver nitrate. The project demonstrated the potential of computer vision in studying dynamic materials and highlighted its relevance to soft robotics and chemistry education.
By working in a research lab, Alex gained practical skills in experimental design, column handling, and understanding the selectivity of various amine sensors. He also learned to optimise camera and lighting setups to overcome challenges posed by reflective liquid metals, and explored the versatility of Kineticolor’s analysis methods. The project reinforced key chemistry concepts such as redox reactions, electrochemistry, and precipitation, deepening his theoretical and practical understanding.
A Fast and Justifiable AI approach to characterising hydrogen bonds
Jack Gallimore
Host University: University of Liverpool
Host Academic: Dr. Olga Anosova
The aim of this project was to develop a more transparent and geometrically justified method for identifying hydrogen bonds in proteins, with a focus on detecting and classifying common secondary structures like helices. Unlike traditional tools such as DSSP, his approach sought to eliminate reliance on manually set thresholds, enabling robust and consistent analysis across all experimental structures in the Protein Data Bank.
At the start of Jack’s project, he explored various geometric features to reliably identify hydrogen bonds in protein structures, focusing on relationships between residues i and i+3, i+4, and i+5. After analysing distributions across the Protein Data Bank, he established a transparent rule based on oxygen–nitrogen distance and C–O–N angle, which showed strong predictive power for secondary structure formation. He plans to continue refining these thresholds with Dr Olga Anosova, aiming to publish the work later this year.
Jack learned to critically structure and produce research-grade work while handling noisy, large-scale data spanning 100,000 PDB files. He gained practical experience with Python libraries and efficient coding techniques, deepening his understanding of protein geometry and hydrogen bonding through hands-on analysis and visualization.
Computational Design of New Chiral Metal Halide Semiconductors Using High-Throughput and Machine Learning
Menna Shirras
Host University: Lancaster University
Host Academic: Dr. Nourdine Zibouche
This project aimed to design new chiral metal-halide semiconductors (CMHS) using machine learning techniques.This involved exploring chemical space with USPEX, compiling structural data, performing DFT calculations, applying supervised ML models to predict bandgaps, and refining promising candidates through high-level computations.
A curated library of 136 CMHS structures was compiled from literature and expanded through substitutions, with additional hypothetical structures attempted via USPEX, though limited by time and computational resources. Electronic data was generated using loose SCF calculations in Quantum ESPRESSO, and inconsistencies in literature data led to the use of a larger, cleaner dataset of 240 organometal halide perovskites for machine learning analysis. Three supervised regression models: Linear Regression, Decision Tree, and Random Forest, were implemented and tuned using Scikit-learn to predict bandgap values, with performance evaluated using MAE, MSE, and R² metrics.
The project allowed Menna to gain practical experience with Python libraries and machine learning algorithms, applying them to materials design and data analysis. She developed strong skills in data visualization and model evaluation, and now aims to expand her knowledge by exploring unsupervised ML techniques and applying her models to larger datasets.
Develop an AI data audit tool to help with data ingestion for the Physical Data Science Data collection
Oscar Robinson
Host University: University of Southampton
Host Academic: Dr. Matthew Partridge and Dr. Samantha Pearman-Kanza
The aim of this project was to perform an initial investigation into the development of an AI-driven data audit tool to streamline the ingestion of datasets into the Physical Chemistry Properties Data Collection (PChProp). The tool needed to be able to analyse submitted datasets, identify inconsistencies or missing metadata, and propose corrections or canonical mappings to align with community standards and support ingestion to the database where the collection is hosted.
Oscar’s project has produced a terminal-driven python script called SAM – the Solubility Audit Manager. This tool is a command line chatbot that makes use of a local HuggingFace LLM, heuristics, and NL parsing, to guide the user through ingestion stages to build a Solubility database building tool.
Additionally, he learnt how to write in a pythonic fashion, moving from experience with Jupyter Lab and .ipynb formats to multiple files assembled using PyCharm. Also gained experience applying local language models, natural language parsing, and regular expressions to import and process chemical data.
Exploring materials space with optimal transport
Zibo Zhou
Host University: University College London
Host Academic: Dr. Keith Butler
The project aimed to develop a systematic, data-driven framework for analogy-based materials discovery by integrating structural and compositional data using fused Gromov-Wasserstein (FGW) distance. This approach was specifically applied to estimate the spectroscopic limited maximum efficiency (SLME) of materials, enabling more informed predictions in materials science.
Zibo’s project demonstrated that the fused Gromov-Wasserstein (FGW) approach performs well in materials discovery tasks with small training sets, achieving results comparable to advanced models like ALIGNN and CrabNet. FGW’s predictive accuracy was highly dependent on the choice of reference material, making it especially useful when data are scarce but reliable predictions are needed. In contrast, ALIGNN and CrabNet showed superior performance as training set size increased, offering practical guidance for model selection based on data availability.
Through this project, Zibo gained a strong understanding of optimal transport theory and the fused Gromov-Wasserstein (FGW) distance, learning how it integrates structural and compositional data for materials prediction. He developed insight into the SLME property in photovoltaics and learned to evaluate model performance using various metrics across different training set sizes. His coding and computational skills improved significantly, enabling him to build efficient data pipelines and analyse large datasets. Additionally, he strengthened his collaboration and project management skills while working within a research group.
Human-AI Collaborative Closed-Loop Optimization Online Platform Development
Maxime Atkinson
Host University: University of Liverpool
Host Academic: Dr. Xenofon Evangelopoulos
The project focused on gaining an introduction to machine learning for closed-loop chemical discovery, with an emphasis on algorithmic benchmarking and orientation within the chemical discovery process.
Maxime developed and applied machine learning techniques, including Bayesian optimization and neural networks, to support chemical discovery tasks. He built and tested optimization algorithms using various surrogate models and acquisition functions on both real and synthetic datasets. Additionally, he implemented Physics-Informed Neural Networks (PINNs) for voltammetry modeling and produced functional codebases for predictive modeling, optimization loops, and data analysis in chemistry.
Through this project, he gained mastery of key tools including Git, VS Code, Sci-Kit Learn, BoTorch, Keras, and TensorFlow, while developing a deep understanding of Bayesian optimization, surrogate models, and acquisition functions and constraints. He learned to structure machine learning projects from data preprocessing to deployment and built skills in neural network architectures, ensemble methods, and Physics-Informed Neural Networks (PINNs) for scientific applications. Additionally, he gained insights into voltammetry’s chemistry and mathematics, as well as valuable experience in research collaboration and academic career development.
Impact of finer k-point sampling on vibrational free energy and crystal structure ranking accuracy
Leo Arogundade
Host University: University of Southampton
Host Academic: Prof. Graeme Day
Leo’s project built on his third-year dissertation by exploring how finer k-point sampling influences vibrational free energy calculations in crystal structure prediction (CSP) workflow. By improving the accuracy of energy data, the work sought to improve the reranking of predicted crystal structures and assess the tradeoff between computational cost and prediction reliability.
The project successfully extended the analysis to an additional 70 crystal landscapes, significantly broadening the dataset beyond the initial dissertation work. The results showed that 16 crystals experienced an improvement in their ranking. In contrast, 29 crystals saw a worsening of their ranks, while 25 crystals maintained their original rankings. This highlighted the nuanced role of vibrational energy in crystal stability, with polymorphs showing the most inconsistent rerankings due to their subtle energetic differences. These findings suggest that while refined calculations can enhance accuracy, they may also introduce variability, especially for polymorphs. Overall, the project indicated that vibrational energy metrics could complement existing CSP workflows to improve prediction reliability.
Leo deepened his understanding of computational chemistry and crystal structure prediction theory, gaining new skills in vibrational energy calculations, data analysis, and scientific programming. He became confident in handling independent research challenges and learned how theoretical models and practical data intersect in materials discovery. Working with advanced computational methods also strengthened his ability to evaluate model performance and manage complex workflows.
Machine Learning acceleration of metadynamic simulations of antimicrobial peptides
Alexandre Peuch
Host University: Imperial College London
Host Academic: Dr. Jarvist Frost
Alexandre’s project aimed to accelerate metadynamic simulations of antimicrobial peptides (AMPs) using machine learning techniques.
In the first part of the project, he developed a Monte Carlo algorithm based on amino acid coupling efficiencies to predict likely byproducts in peptide synthesis, guiding experimental efforts to identify active compounds in impure samples. The second part involved running 32 metadynamics simulations using both traditional and machine learning force fields (AMBER99SB-ILDN and Grappa) to study folding and dimerisation of four AMPs in polar and non-polar solvents. The results revealed consistent trends but also discrepancies—particularly in dimerisation—highlighting the importance of bonded parameters and the potential of ML-based force fields in capturing complex aggregation behaviours.
Alexandre developed a wide range of computing skills during his project, including proficiency with the UNIX command-line, virtual environments, Git, and high-performance computing. He gained hands-on experience with Monte Carlo simulations using Julia and Molecular Dynamics simulations via GROMACS, as well as applying cutting-edge machine learning techniques to study antimicrobial peptides. Notably, he trained neural networks to generate machine-learned collective variables for efficient biasing of MD simulations. These experiences have strengthened his interest in computational chemistry and confirmed his intention to pursue a PhD in the field.
Machine Learning Enabled Discovery of Point Defects Qubits for Quantum Technologies
Atshaam Ashraf
Host University: Imperial College London
Host Academic: Dr. Alex Ganose
The aim of this project was to use machine learned interatomic potentials (MLIPs) to screen defect complexes that are suitable for use as point-defect qubits. This screening will ensure hosts that exhibit low defect energies, long coherence lifetimes and the required symmetries for optically addressable qubits will be identified and put forward for further study by experimental and computational groups in quantum technologies.
Atshaam’s project involved calculating defect formation energies for CrAl neutral substitution defects in aluminium oxide phases using MLIPs and various computational tools. While MLIPs showed good agreement with DFT for primitive cells, they produced unphysical negative energies for defect structures, even after GGA+U corrections. This revealed limitations in the MLIP training data, which lacked accuracy for localised defect states. The findings highlight the need to train MLIPs on hybrid functional data to improve their reliability beyond bulk thermodynamics.
Through this project, he gained hands-on experience with key tools in computational materials chemistry, including the Materials Project API, doped and ShakeNBreak packages, and VASP for DFT calculations on Imperial’s HPC cluster. He learned how machine-learned interatomic potentials (MLIPs) can significantly reduce computational cost and accelerate research workflows. The project also introduced him to defect chemistry, deepening his understanding of material properties and their theoretical foundations. As his first independent research experience, it strengthened his Python, data analysis, and problem-solving skills, and confirmed his interest in pursuing a research career in atomistic simulations using machine learning.
Machine Learning Force Fields for CO₂ Adsorption and Reduction in Porous Solids
Dongin Kim
Host University: Imperial College London
Host Academic: Prof. Aron Walsh
Dongin’s project aimed to create a general, descriptor-driven framework to connect homogeneous and heterogeneous catalysis by aligning electronic properties. Using the d-band centre of a homogeneous Rh–phosphine complex as a benchmark, he designed Rh–P nanoparticles with matching surface electronic structures. This was achieved through a closed-loop workflow combining ML-accelerated molecular dynamics, DFT calculations, bonding analysis, and catalytic testing, all scripted for reproducibility and future application to other material systems.
The project found that the surface d-band centre deviation is the most reliable predictor of catalytic performance, outperforming geometric metrics. Using this insight, Rh₃P was identified as the optimal composition, surpassing previously favoured Rh₂P due to better electronic alignment with the homogeneous benchmark. Mechanistic analyses (Bader charge, XPS, COHP/ICOHP, PDOS) consistently showed that Rh₃P offers balanced bonding and near-optimal adsorbate interaction. Rh-rich compositions failed due to electronic phase separation and excessive Rh–Rh delocalisation. The team also delivered a reproducible ML-MD + DFT pipeline and proposed a transferable “electronic alignment” design principle for heterogeneous catalysts.
Dongin gained practical experience in applying AI-based molecular dynamics to a real-world catalytic design problem. She learned to use machine learning force fields (MACE and MatterSim) within ASE to run annealing–quenching simulations and validate structures before DFT analysis, deepening her understanding of the strengths and limitations of current ML potentials. She also developed skills in building reproducible computational workflows, integrating ML-MD, DFT, and electronic structure analysis into a coherent pipeline. This included automating nanosphere generation, MD runs, and figure preparation with statistical annotations. Overall, Aysel came to appreciate how AI-enabled simulations can accelerate catalyst discovery by bridging exploratory structure generation with rigorous quantum chemical analysis.
Multi-Objective Bayesian Optimisation Approach Towards Advancing Automated Liquid-Handling Platforms
Xiaojun Hu
Host University: Imperial College London
Host Academic: Prof. Becky Greenaway
The project aimed to apply Bayesian Optimisation (BO) to optimise solvent-handling parameters on the Opentrons OT-2 (robotic liquid-handling platform). A new evaluation metric, the Sum of Absolute Differences (SAD), was introduced to improve the accuracy and precision of solvent transfers. Building on this, the goal is to develop a fully automated closed-loop workflow that runs experiments with minimal human intervention. This approach enhances efficiency, throughput, and safety, marking a key step toward the next generation of chemistry laboratories.
Xiaojun conducted dispensing experiments with ethanol and chloroform on the Opentrons OT-2, identifying seven key design variables that influence solvent transfer accuracy. She introduced the Sum of Absolute Differences (SAD) as a new metric, which showed stronger correlation with ideal dispensing behaviour than traditional R² measures. Using Web-BO with a Latin Hypercube Sampling strategy, she successfully optimised ethanol dispensing within four iterations, while chloroform required further refinement due to its complex physical properties. In the final phase, she contributed to developing a fully automated closed-loop dispensing and weighing station, writing and testing custom protocols. This work lays the foundation for autonomous experimental workflows with minimal human intervention.
Through this project, Xiaojun learned to operate and troubleshoot the Opentrons OT-2, design fair performance metrics, and apply Bayesian Optimisation for experimental parameter tuning. She gained experience in collaborative coding workflows using Python, VS Code, and GitHub, and developed confidence in presenting results to a research group. Working closely with the Greenaway team, she learned to collaborate effectively and engage in a dynamic research environment. Under the guidance of doctoral researchers Sean Gurung and Alex Ostudin, and with technical support from Dr. Austin Morz, she gained valuable exposure to Web-BO. Xiaojun is especially grateful to Prof. Becky Greenaway for the opportunity to explore the field of computational chemistry.
NeuralBind: Enhancing chemical coverage and diversity of training data for binding-affinity predictions.
Savva Grevtsev
Host University: University of Oxford
Host Academic: Prof. Philip Biggin
The aim of this project was to train ML models (RFscore, EHIGN, AEV-PLIG) on various real-world and synthetic datasets, evaluate performance, infer what models learn and how well they generalise as well as the feasibility of synthetic data use in the field of binding affinity prediction..
Savva contributed as second author to a published paper (arXiv:2507.07882) and made significant contributions to multiple codebases. The project revealed that both synthetic and real-world datasets in molecular modelling suffer from inconsistent quality, and many commonly used benchmarks and tools are flawed. Despite advances in model architectures, performance has largely plateaued, with GNNs showing modest improvements due to better inductive biases. While synthetic data can help, stringent quality control is essential, large volumes of mediocre data offer little benefit. The findings suggest that future progress in the field will depend heavily on large-scale, high-quality data efforts, unless a new modelling paradigm emerges.
Savva gained deeper familiarity with Python packages, the Git version control system, and working with remote HPC clusters, while training and applying machine learning models, primarily random forests and graph neural networks, for binding affinity scoring. She developed skills in dataset curation and model performance evaluation, which are essential in bio/cheminformatics. The project also highlighted the widespread issue of poorly documented and unreliable codebases in the field, which he found frustrating, though noted gradual improvements. Overall, the experience broadened his technical capabilities and reinforced the importance of clean, maintainable code in computational research.
Towards the Development of Asynchronous Solvent Handling Capabilities for Automated Liquid Handling
Jessica Lai
Host University: Imperial College London
Host Academic: Prof. Becky Greenaway
This project focused on improving the accuracy of the Opentrons OT-2 robotic liquid handler when dispensing volatile organic solvents, which often drip due to pressure build-up from water-based calibration. To tackle this, a custom function introducing an additional air-gap step was developed to mitigate dripping. This function was integrated into a closed-loop Bayesian optimisation workflow, allowing automated optimisation of pipetting parameters across various solvents. The approach reduces human bias, time, and labour compared with traditional manual optimisation.
Jessica successfully developed a custom midpoint function for the Opentrons OT-2, which introduced an air-gap step between the pipetting origin and destination to reduce solvent dripping. This function was optimised for volatile and heavy solvents like DCM, Et₂O, and CHCl₃, resulting in lower standard deviations and improved linearity in dispensing accuracy. She implemented a Web-based Bayesian optimisation (Web BO) workflow to fine-tune seven key pipetting parameters, using defined ranges and iterative feedback. A fully automated closed-loop system was established, integrating the OT-2, a DOBOT robotic arm, and a weighing balance to execute and evaluate dispensing tasks autonomously. The midpoint function enhanced the precision of the optimisation process, supporting more reliable and consistent solvent handling.
Through this project, Jessica strengthened her Python programming skills through troubleshooting and developing a custom midpoint function, gaining confidence in error handling and collaborative coding. She expanded her technical knowledge by working with Linux, GitHub, and PowerShell, and learned to adapt her code within a team setting. The project introduced her to the Opentrons OT-2, where she explored its hardware and software architecture, including low-level API control. She also gained a solid understanding of Bayesian optimisation and its application in refining experimental conditions efficiently. Beyond technical skills, Jessica improved her presentation abilities and developed a deeper appreciation for consistency and teamwork in a research environment.
Transforming Undergraduate Learning. Migration of a Student Computational Drug Discovery Assignment to a Python Based Project.
Zhaohui Jiang
Host University: Manchester Metropolitan University
Host Academic: Dr. Alex Aziz
Zhaohui’s project aimed to transition a second-year computational drug discovery assignment from SCIGRESS, a proprietary tool, to an open-source Python-based workflow using Jupyter Notebook. The goal was to equip students with practical Python programming and data analysis skills while engaging them in virtual drug screening tasks. Using libraries like NumPy, Pandas, Matplotlib, and Scikit-Learn, the workflow introduces QSAR modeling, Lipinski’s Rule of Five, and IC₅₀ data analysis for HIV-1 protease compounds. He developed interactive teaching materials structured for delivery over five sessions, ensuring accessibility and reproducibility. Ultimately, the project supports modern, accessible computational chemistry education using open tools and real-world datasets.
He successfully developed a Python-based virtual drug screening pipeline, retrieving and cleaning IC₅₀ data for HIV-1 protease from ChEMBL, calculating molecular descriptors with RDKit, and applying Lipinski’s Rule of Five for initial filtering. She built and evaluated QSAR models using Scikit-Learn, finding that random forest regression outperformed linear regression in predictive accuracy. This model was then used to screen compounds from the SWEETLEAD database, identifying the top ten potential inhibitors. On the teaching side, Zhaohui created interactive Jupyter Notebook tutorials covering molecular visualization, cheminformatics, and machine learning, enhanced with ipywidgets and matplotlib. These deliverables provide a complete research-to-teaching workflow, supporting future interdisciplinary education in AI-driven drug discovery.
Through this project, he developed both technical and conceptual skills in applying Python and AI to chemistry. They learned to visualise and analyse chemical data using tools such as Py3Dmol, ChEMBL, and RDKit, exploring the relationship between molecular properties and pharmacological activity. Using Scikit-Learn, they built and compared QSAR models, gaining insight into model performance, feature selection, and data quality. Additionally, by creating interactive Jupyter Notebook tutorials, they enhanced their ability to communicate complex research processes and deepened their understanding of how AI can advance chemical research.
Accordion title 15
Accordion title 16
AI Agents for Chemistry: Designing the LLM-CDS Prototype
Meng Fang – University of Liverpool
Purpose, Aims, and Scope of the Visit:
This three-month visit to University College London, hosted by Prof. Jun Wang, will focus on designing and prototyping a Large Language Model-based Chemical Data Scientist (LLM-CDS). Building on the Agent K framework, the project aims to create an autonomous agent capable of structured reasoning across diverse chemical data and literature to support reaction prediction, property inference, and hypothesis generation. The visit will deliver a functional prototype, establish benchmarking protocols, and develop collaborative outputs including a technical paper and funding proposal. This collaboration combines expertise in AI agents and chemistry, advancing trustworthy, data-driven automation in chemical discovery.
Autonomous Optimisation of Sustainable Ligand Synthesis for Materials Discovery
George Lyall-Brookes – University of Liverpool
Purpose, Aims, and Scope of the Visit:
George’s proposed visit to the University of Cambridge, hosted by Prof. Alexei Lapkin, will focus on developing a sustainable and transferable route to polyaromatic ligand precursors using self-optimising algorithms in flow chemistry. These ligands are essential for constructing advanced materials, yet their synthesis remains a bottleneck. By applying multi-objective optimisation and autonomous closed-loop systems, George aims to create high-yielding, cost-effective, and environmentally friendly procedures. The visit will combine hands-on lab work with algorithmic development, increasing his expertise in flow chemistry and AI-driven synthesis while contributing to scalable solutions for high-throughput materials screening.
Generative AI-Guided Discovery of Polar Materials for Ferroelectric Applications
Andrij Vasylenko – University of Liverpool
Purpose, Aims, and Scope of the Visit:
This collaborative visit aims to establish a computational–experimental partnership between Andrij and Empa to accelerate the discovery of new non-centrosymmetric polar materials, particularly ferroelectrics. The project will apply PIGEN – a new in-house physics-informed generative AI model for inorganic crystal structure design to predict and prioritise promising candidates for experimental synthesis. Two short visits will align research goals, validate AI-generated structures, and refine modelling strategies toward publication and a Horizon Europe proposal. Combining AI-based prediction with Empa’s synthesis expertise, the collaboration will demonstrate the power of generative models in guiding experimental discovery of functional materials.
Integrating Machine Learning Potentials with Neutron Scattering for Advanced Materials Simulation at the European Spallation Source
Harry Richardson – University of Bristol
Purpose, Aims, and Scope of the Visits:
This visit supports the development of a machine learning-enhanced workflow for analysing neutron scattering data at the European Spallation Source (ESS). The project focuses on integrate machine learning interatomic potentials (MLIPs) into neutron scattering data workflows, enhancing automation and analytical precision. Through collaboration with the ESS Data Management and Software Centre, Harry will develop methods linking MLIP simulations to experimental scattering pipelines. It will also explore opportunities for ESS “first science” experiments validating MLIPs using Quasi Elastic Neutron Scattering (QENS) on liquid systems and refine fine-tuning strategies for disordered materials. This collaboration will strengthen the synergy between machine learning and neutron scattering, ensuring data-driven, physics-informed analysis is embedded in future ESS research infrastructure.
Understanding of mechanistic principles in gold(I)-catalysed synthesis using a top-down machine learning approach
Risnita Listyarini – University of Strathclyde
This two-month visit to the University of British Columbia, Risnita will collaborate with Dr. Jolene Reid to apply data-driven machine learning approaches to study asymmetric gold(I)-catalysed reactions. The project aims to uncover mechanistic principles governing enantioselectivity using top-down modelling techniques such as clusterwise linear regression, which identify key molecular interactions with minimal human bias. By developing predictive statistical models, the collaboration seeks to extend mechanistic understanding across diverse reaction types, improving catalyst design and reaction optimisation. This work will enhance Risnita’s expertise in data-driven chemical modelling while advancing the broader goal of applying AI to complex organic reaction mechanisms.
Purpose, Aims, and Scope of the Visit:
Leveraging Ontological Knowledge with Argumentative Agentic AI to Accelerate Chemical Development
Prof. Alexei Lapkin – University of Cambridge
Prof. Francesca Toni – Imperial
Dr. Antonio Rago – Kings College London
Project Summary
In this project we aim to demonstrate how relational databases can enhance AI workflows, focusing on an advanced argumentative agentic AI approach and a well-known and highly challenging problem of predictive scalability in chemical process development. Specifically, the project will aim to develop a human-in-the-loop AI framework for guiding multi-step process scale-up in the context of small molecule active pharmaceutical ingredients (APIs) manufacture.
Watch the Interview
Project Team
Prof. Alexei Lapkin is a Professor of Sustainable Reaction Engineering at the University of Cambridge. His group pioneered the use of Bayesian Optimisation in process engineering and developed tools that became widely adopted in academia/industry (Summit, ORDerly, reaction balancing, etc). His work on AI methods in chemical process development led to the establishment of four start-up companies (ReactWise, Accelerated Materials, GreenCAT, Chemical Data Intelligence).
Prof. Francesca Toni is Professor in Computational Logic and Royal Academy of Engineering/JP Morgan Research Chair on Argumentation-based Interactive Explainable AI at Imperial College London, UK. She leads Computational Logic and Argumentation research group and of the XAI research centre. Her research interests lie within the broad area of Knowledge Representation and Reasoning in AI and Explainable AI. Among notable past projects, she has coordinated two EU projects, was Technical Director of the ROAD2H EPSRC-funded project and co-Director for the Centres of Doctoral Training in Safe and Trusted AI and in AI for Healthcare. She is in the Board of Directors for KR Inc. and IJCAI trustee.
Dr Antonio Rago is a Lecturer in Computer Science at King’s College London. He has been one of the pioneers of argumentative XAI, delivering numerous high-profile talks on the subject, including one on IJCAI’s invited Early Career Researcher track. He has been instrumental in finding novel application areas of argumentation and XAI, including mechanical engineering, online review aggregation and judgmental forecasting. He has also pioneered, in a collaboration with Mercedes-AMG Petronas F1 Team, the use of explainable RL for F1 race strategy.
Potential for Impact
The project will impact academic community working in AI for chemistry through developing the standardised process chemistry ontology and a suite of agents that operate with relational chemistry databases. This will create a foundational layer for the future deployment of advanced AI tools in process chemistry and can be extended to discovery chemistry and materials. The developed human-in-the-loop framework of multi-agent argumentative AI is an advanced AI concept and will serve as a critical demonstration of advances in AI for chemistry challenges, whereas the developed codes could be used for rapid implementation within industrial setting.
Alignment of Generative AI for Materials Discovery via Experimental Feedback
Dr. Shijing Sun – University of Cambridge
Prof. Aron Walsh – Imperial
Project Summary
Our project addresses the challenge of bridging the simulation-to-real gaps in inorganic materials discovery directly. We propose an integrated, closed-loop platform that connects generative AI for crystals with high-throughput robotic synthesis and characterisation. We will deliver a processing-conditioned, disorder-aware generative model that is calibrated by small-batch experimental feedback. Our objective is to align generative models using real experimental outcomes for AI-driven discovery in which hypothesis generation, synthesis, and evaluation are all automated and connected.
Watch the Interview
Project Team
Dr Shijing Sun is an experimental materials scientist specialising in high-throughput synthesis, characterisation, and accelerated device testing for energy materials across academia and industry. She specialises in automated processing of solution-processed semiconductors (halide perovskites and perovskite-inspired compounds), including recent work that applies computer vision and reinforcement learning for microstructure analysis and synthesis planning of defect-minimised Ag–Bi–I thin films, and a Bayesian-optimisation-driven robot for lead halide perovskite synthesis under humid conditions. Formerly Assistant Professor of Mechanical Engineering at the University of Washington, she developed low-cost, modular “Lego-style” self-driving laboratory platforms for solution-phase synthesis. In September 2025 she joined the Department of Materials Science and Metallurgy at the University of Cambridge, where she is establishing the Autonomous Labs for Energy Materials focused on functional inorganic and hybrid framework materials.
Prof. Aron Walsh is a computational materials chemist whose research integrates quantum-mechanical simulation, machine-learned force fields, and generative models to design functional materials for renewable energy and optoelectronics. He is Professor at Imperial College London and leads the Materials Design Group, which develops open, reproducible software and data workflows used internationally. He has led large, multi-partner programmes and supervises an interdisciplinary team spanning chemistry, physics, and computer science; alumni now hold positions in academia and industry. Within this project he will direct the computational work package, linking ab initio calculations with ML force fields and generative models, and establishing FAIR practices to couple predictions with experimental validation and iteration.
Potential for Impact
The outcomes will advance both the AI–chemistry interface and the materials science community:
Scientific Impact: Assess whether generative AI can predict stable materials phases that are experimentally realisable. Success will demonstrate a viable route to bypass the limitations of traditional high-throughput computational screening, which is confined to exploration of the known chemical space.
Community Impact: The UK is competitive globally in laboratory automation, exemplified by autonomous synthesis platforms at the University of Liverpool, University of Glasgow, etc. Building on this national strength, our proposal brings a complementary synthesis capability for solution-processed thin-film semiconductors, led by new PI Sun at the University of Cambridge. All generated structures, experimental characterisation data, and model checkpoints will be shared openly via the Materials Cloud and GitHub, following FAIR data principles. This ensures reproducibility and supports downstream adoption by other research groups and industrial partners.
Beyond the Project: The proposed AI–experiment loop can be extended beyond halides to many other chemical systems, helping to explore parts of materials space that traditional screening overlooks. By making generative design disorder- and processing-aware, the approach can uncover realistic new candidates across fields such as energy storage, catalysis, and quantum technologies. This foundation supports future EPSRC Centre or Programme Grant proposals on closed-loop AI materials discovery.
Human-AI Teaming for Chemistry (HATCH)
Dr. Jihong Zhu – University of York
Dr. Gabriella Pizzuto – University of Liverpool
Professor Robert Gaizauskas – University of Sheffield
Professor Ian Fairlamb – University of York
Project Summary
A step-change in synthetic chemistry can only be realised through an intelligent and physical synergy between human chemists, AI coordinators, robotic platforms and chemistry equipment. This project aims to provide and validate such a framework, aiming to bridge the gap between today’s highly-equipped, specialist laboratories and the wider chemistry community, creating an accessible, truly collaborative laboratory of the future.
Watch the Interview
Project Team
Dr. Jihong Zhu is an expert in robot learning. He established the Robot-assisted Living Lab (RALLA) at York, equipped with bimanual manipulators and advanced sensors, providing the core infrastructure and technical leadership for this project. His pioneering work on human-centred bimanual manipulation demonstrates proven success in creating complex, human-AI collaborative systems.
Dr. Gabriella Pizzuto is a RAEng Research Fellow specialising in AI-driven “robotic scientists.” Her work, recognised at flagship robotics conferences (ICRA, CASE) and on robot learning for chemistry applications has attracted additional EPSRC funding support, including a recently awarded New Investigator Award. Her leadership roles within the Henry Royce Institute and AIchemy (ECR committee co-chair) place her at the forefront of this field.
Prof. Robert Gaizauskas is a world-leading expert in Natural Language Processing (NLP). He has worked extensively on text mining from the scientific literature and his recent research focusing on using LLMs for spoken language interaction with robots is directly relevant. His leadership in the Sheffield NLP group and a UKRI CDT ensures AI development will be built on a foundation of cutting-edge research and best practices. He brings not only NLP expertise, but the experience and expertise of colleagues in robust speech recognition in noisy environments, with whom he collaborates closely.
Prof. Ian Fairlamb provides world-class expertise in metal catalysis and automated chemistry. He directs the highly-equipped Automated Robotics for Chemistry Laboratory at York, a hub connecting chemistry with computer science and engineering. His strong industrial links, including a PhD studentship with Labman and equipment investment from Chemspeed, underscore the industrial relevance of his work. As co-director of ALBERT, a doctoral training centre on automated laboratory experiments, he is at the forefront of training the next generation of researchers in this field.
Potential for Impact
This project will transform chemical discovery by combining AI, automation, and robotics into a self-driving laboratory. By dramatically increasing the speed, safety, and efficiency of experimentation, it will accelerate the development of new medicines, sustainable materials, and clean energy solutions, helping tackle global challenges from healthcare to climate change.
e

