Explore our Funded Projects

Browse through the projects funded by AIchemy, highlighting cutting-edge research, training, and collaboration at the intersection of AI and chemistry.

This page will be regularly updated with project outcomes, results, and new developments, so please check back often to stay informed about the latest progress from these projects.

  • Student Internships 2024
  • Pump Priming Funding Call 2024
  • Student Internships 2025
  • Collaborative Travel (R1) 2025

Accelerating Chemistry Lab Automation Through AI-Driven Robotics

Alex Wright
Host University: University of Liverpool

Host Academic: Dr. Gabriella Pizzuto

Alex’s project focused on investigating the simulation-to-reality gap in particle behaviour for robotic chemistry, using the ORBIT open-source robotics simulator. Alex focused on developing a method to represent this gap in simulated particles to improve their usability for robotic chemists. The project involved deploying a simulated robotic environment to measure the angle of repose and comparing these results to real-world experiments. Ultimately, the goal was to refine simulation parameters to better align virtual and physical systems.

The project resulted in the development of a low-cost powder flow testing method to analyse discrepancies between simulated and real particle behaviour. Key findings revealed significant challenges in accurately replicating particle collisions, as small particles passed through simulation boundaries, skewing comparisons. Additionally, simulated salt particles differed markedly from their real counterparts in appearance and flow, producing an unrealistically low angle of repose. These findings highlight the need for further refinement of environment parameters or alternative particle models. With more time and data, the workflow could be improved to reduce these discrepancies and enhance the simulator’s realism.

Through this project, Alex strengthened his ability to work within a collaborative coding environment, building upon existing particle physics frameworks and integrating new methods. He gained foundational experience with ROS, learning how simulated environments can be transferred to physical robotic setups. He also improved his communication and teamwork skills, as he frequently collaborated with colleagues to troubleshoot and refine code. Additionally, Alex improved his technical abilities in 3D CAD modelling and scientific presentation, broadening his practical and professional skill set beyond coding alone.

Bayesian Optimization for Chemical Applications

Eltjo Mante
Host University: Imperial College London

Host Academic: Prof. Kim Jelfs

This project aimed to enhance a Bayesian Optimization web application (web-BO) and explore multi-fidelity Bayesian Optimization (MFBO) for chemical problems. The first goal was to improve the usability and functionality of web-BO through bug fixes and new features. The second goal involved investigating MFBO methods in the context of chemical simulations, particularly focusing on the 6D Hartmann function.

Key outcomes included the development of a video tutorial page to replace a looping GIF, enabling users to pause and navigate freely. An explanations page was added to clarify tool functionality and input formats, complete with sidebar navigation and tooltips. Several bugs were resolved, including overlapping input cells and trailing decimals, and graph displays were cleaned up for clarity.

On the MFBO side, the project compared low- and high-fidelity optimization methods, revealing that low-fidelity approaches often reached the global minimum more effectively than high-fidelity ones, which tended to get stuck in local minima. A major challenge was modeling correlation between fidelities, which proved to be highly problem-specific and complex. Future work could focus on integrating MFBO into web-BO, with correlation modeling as a key step.

Eltjo gained hands-on experience in machine learning, full-stack web development (Flask, HTML, CSS, Bootstrap, JavaScript), and version control using Git and GitHub. The collaborative environment also fostered growth in communication and teamwork, balancing independent problem-solving with guidance from experienced peers.

Data Science Approach to Crystal Structuress?

Ziqiu Jiang
Host University: University of Liverpool

Host Academic: Prof. Vitaliy Kurlin

This project aimed to utilize the average-minimal-distance (AMD) metric to assess the similarity between novel predicted crystal structures and experimentally determined structures stored in the Cambridge Structural Database (CSD). This comparison is essential for validating computational predictions of protein structures by providing a quantitative measure of how closely they align with known experimental data.

Ziqui successfully computed the nearest neighbours of predicted crystal structures by comparing them with over 850,000 experimental crystal structures from the Cambridge Structural Database (CSD). This was achieved using the Average Deviation from Asymptotic (ADA) metric, an invariant that accounts for both geometric considerations (through AMD) and the density of the atomic points in crystals. It was observed that most predicted crystal structures with high density tend to closely match experimental structures. However, lower-density predicted crystals often displayed more significant gaps when compared to experimental structures. These cases have the potential to inform further research in crystal structure prediction and validation.

Ziqui gained a deeper understanding of mathematical invariant and their applications in computational chemistry. He also learned key concepts in crystallography, particularly the structural properties of crystals and definition of crystal components, and how these concepts can be integrated into computational models. By applying algorithms to compare predicted crystal structures with experimental data, he developed practical skills in analysing crystal structures and identifying structural similarities, further reinforcing my computational and analytical expertise.

Learning from large-scale CSP – database-informed prediction of spontaneous resolution

Leo Arogundade
Host University: University of Southampton

Host Academic: Prof. Graeme Day

This project aimed to investigate classifier models for predicting chiral resolution by crystallisation of small organic molecules. The models were trained on computational data comparing the relative stabilities of racemic and enantiopure crystal structures. As part of the workflow, Leo used the Cambridge Structural Database API to extract molecular files, applied RDKit to calculate molecular descriptors, and trained logistic regression and support vector machine models to predict crystallisation outcomes. The project contributes to understanding how molecular features influence chiral resolution and explores the potential of machine learning in crystal engineering.

Leo applied several classification models to predict chiral resolution based on energy differences between racemic and enantiopure crystal structures from a dataset of 298 molecules. Initial models (SVM, logistic regression, decision tree) achieved over 90% accuracy but were biased toward predicting racemic crystallisation due to class imbalance. A second round of modelling with balanced data reduced bias but also lowered accuracy, which was later improved through feature selection and de-correlation, resulting in final accuracies of 83% (SVM), 62% (logistic regression), and 73% (decision tree). Toward the end of the project, Leo began planning how to expand the dataset by proposing new molecules for future study in the forerunner CSP project. These outcomes highlight both the potential and limitations of machine learning in predicting crystallisation behaviour.

Through this project, Leo gained practical experience in literature analysis, molecular data extraction using RDKit and MOL files, and using the Cambridge Structural Database API to retrieve crystal information via refcodes. He developed a strong understanding of machine learning algorithms—support vector machines, logistic regression, and decision trees—and learned how to implement them in Python. Additionally, he improved his presentation skills, and learned to set up computing environments on a supercomputer (Iridis5), broadening his technical capabilities in high-performance computing.

Machine Learning Tight Binding for Proton Battery

Xuheng Zhao
Host University: Imperial College London

Host Academic: Dr. Jarvist Frost

Xuheng’s project aimed to optimise Density Functional Tight Binding (DFTB) parameters from a standard database specifically for targeted materials, using machine learning techniques with the assistance of the Tight Binding Machine Learning Toolkit (TBMaLT). These refined parameters are intended for use in molecular dynamics simulations to predict material properties, contributing to the development of a proton battery. The project focused on building and validating a workflow using small molecules to demonstrate its feasibility and accuracy. Ultimately, the goal is to establish a robust, transferable simulation pipeline for future energy-related materials research.

A machine learning workflow using TBMaLT was successfully built to optimise DFTB parameters, involving dataset preparation, input formatting, model training with PyTorch, and performance evaluation against DFT ground truth. However, due to limitations in TBMaLT’s current development stage, the predicted properties lacked sufficient accuracy, and final parameter extraction for molecular dynamics was not completed. In parallel, Zuheng developed a non-ML optimisation workflow using the Nelder-Mead algorithm, which directly refines DFTB parameters via DFTB+ calculations and MSE loss minimisation. While promising, this workflow requires further refinement to ensure physically meaningful Hamiltonian matrices. Together, these efforts lay the groundwork for a robust simulation pipeline to support proton battery material design.

Through this project, Xuheng learned the fundamentals of tight-binding methods and how they integrate with AI in scientific research, enhancing my understanding of AI’s role in advancing science. Additionally, she gained knowledge in compiling and using software such as Gaussian, FHI-aims, and DFTB+, understanding their suitability and limitations for various purposes. She also acquired a basic understanding of the development of science-purpose packages, like TBMaLT.

Machine Learning-Assisted High-Throughput Screening for Organic Semiconductors: A Comprehensive Study and Database Development

Malin Zollner
Host University: University of Strathclyde

Host Academic: Dr. Tahereh Nematiaram, Dr. Yashar Moshfeghi

Malin’s project aimed to explore the research landscape of organic semiconductors, focusing on their applications in sensors, photovoltaics, and light-emitting diodes. A key objective was to establish a literature-based database containing experimental data such as power conversion efficiency and structural/electronic fingerprints of donor and acceptor materials. Malin also planned to compute structural fingerprints and replicate machine learning methodologies from previous studies to trial high-throughput screening techniques. The ultimate goal was to support materials selection and optimisation using machine learning for organic photovoltaic applications.

She conducted a thorough literature review on organic semiconductors, identifying key descriptors, fingerprints, and machine learning algorithms used in sensor, photovoltaic, and LED applications. She developed code to run machine learning models and generate a database of molecular fingerprints and descriptors, which is now available for future research. A detailed database of organic donor and acceptor molecules was compiled, enhanced with Morgan, Daylight, and MACCS fingerprints. This resource supports ongoing efforts in machine learning-assisted materials screening for organic photovoltaics.

Throughout the project, Malin gained a solid understanding of the fundamentals of organic semiconductors and how artificial intelligence is being applied in this field through literature review and research. Regular meetings with her supervisors helped her improve presentation skills and deepen her understanding of the research process. Under expert guidance, she developed and refined her coding skills, which were especially valuable given her limited prior experience. She also learned how to run machine learning algorithms and became familiar with key AI terminology, marking her first steps into computational research. Overall, the project provided a strong foundation for future work in AI-driven materials science.

Mapping the chemical space of intermetallic compounds

Ryan Napo Nduma
Host University: Imperial College London

Host Academic: Prof. Aron Walsh and Anthony Onwuli

The aim of the project was to learn and understand the different chemical properties that distinguish intermetallics from other possible compounds within the chemical space. Then utilize machine learning and relevant computational techniques to contribute to the Walsh Materials Design group’s SMACT software by building a series of filters and rules. These filters and rules would allow SMACT to appropriately screen for these compounds within the chemical space.

Ryan developed two key rules: the Valence Electron Count and the Electronegativity Difference to help distinguish intermetallic compounds, based on insights from literature and guidance from Prof. Walsh and Anthony. He applied these features in a classification exercise using random forest and logistic regression models, exploring how feature weighting affects model performance. His contributions included writing code that was integrated into the latest version of SMACT, enabling new functionality such as Valence Electron Count calculations. Additionally, he helped improve SMACT’s usability by fixing examples and tutorials. These outcomes align with the project’s goal of mapping the chemical space of intermetallics and enhancing tools for materials discovery.

Through this project, Ryan gained a deeper understanding of materials informatics, including its potential to drive innovation in sustainable and high-performance materials across industries. He developed practical skills in machine learning, mastering basic techniques like regression and classification while beginning to explore advanced concepts such as regularization and dimensionality reduction. Through regular meetings and workshops, he improved his presentation and communication skills, learning to express ideas clearly in a research setting. Ryan also learned how to initiate and manage a research project, from defining objectives to developing impactful solutions. Importantly, he took his first steps into computational research, gaining confidence in coding and working within a collaborative research environment.

Modelling Phase Transitions: Characterising Henry’s Law

Josh Cheung
Host University: University of Southampton

Host Academic: Dr. Joanna Grundy, Prof. Jeremy Frey

This project main focus was to use machine learning models to predict values for solubility (log S) and Henry’s law constant (kH), exploring the links between the two properties. This was completed via data curation to create a combined dataset, followed by data processing, training and selection of machine learning models, and analysis of predictions against a withheld test set.

Two machine learning models were successfully developed to predict solubility and Henry’s law constant, trained on large datasets and tested on a shared subset of 2,563 datapoints. Both models achieved acceptable accuracy, with around 50% of predictions scoring below 1 for mean squared error (MSE), and the Henry’s law model performed exceptionally well for hydrocarbons. However, due to time constraints, the analysis of feature importance was not completed, leaving room for future investigation using techniques like recursive feature elimination and SelectKBest. A number of data quality and modelling flaws were identified, highlighting opportunities for improvement in data sanitation, feature engineering, and model refinement. The project lays a strong foundation for further development and analysis in predictive modelling of chemical properties.

The project provided Josh with hands-on experience in data curation, model training, and result evaluation. They also gained valuable transferable skills in project organisation, time management, and scientific reporting. Importantly, this experience deepened their understanding of machine learning in physical chemistry and helped guide their career aspirations towards computational and data-driven research.

Accelerating Materials Discovery: Integrating Machine Learned Force Fields (MLFF) with Monte Carlo Simulations

Jay Zhou – University of Bath
Steve Parker – University of Bath
Tom Underwood – STFC, RAL

Project Summary:

Recently, the use of Machine Learned Force Fields (MLFFs) for Molecular Dynamics (MD) simulations have popularized. However, this is not the case for Monte Carlo (MC), which performs exceptionally well for systems such as gases, complex mixtures and adsorptions. In fact, the machinery for MLFF integrated MC is currently absent on open-sourced platforms. Therefore, the goal of this project is to develop an open-sourced software framework that integrates MLFFs with MC simulations to enhance the calculation of various thermodynamic properties of solid-state materials. By leveraging MLFFs trained on ab-initio data, we aim to improve the accuracy of MC simulations in predicting precise free energy properties, while offering significantly faster compute speed than ab-initio methods.
 
This approach addresses the limitations of classical force fields (CFFs), which are traditionally used to describe the interactions between atomic and molecular species in MC simulations but often fall short in accuracy. With this project, we commit to program a reliable interface between the Monte Carlo simulation software package DL_MONTE (a member of the Daresbury Lab software suite) and MLFFs in the form of universal Python functions (callable by ASE). We will also engage in extensive testing of the said interface, by using the CHGNet Neural Network Force Field to calculate thermodynamic properties of water via Grand Canonical Monte Carlo (GCMC). The final code package will be released open-sourced alongside detailed documentation and tutorials to attract community engagement and fast track the adoption of AI in the MC simulation community.

AI-Enabled Prediction of Lipid Membrane Composition from Optical Signatures

Dr. Miguel Paez Perez – Imperial College London
Prof. Marina Kuimova – Imperial College London

Project Summary:

Lipid membranes play a key role in biology; they provide structural integrity, are central in intercellular communication, control content exchange, and transduce extracellular signals. Their functionality arises from their unique structure and biophysical properties, which are dictated by the interaction between the membrane’s constituent lipid molecules. Deregulation of these lipid-lipid interactions has been linked to diseases including cancer, malaria, Alzheimer, or atherosclerosis. From a commercial perspective, membrane composition and lipid-lipid interactions influence the efficacy of lipid-based drug carriers, such as miRNA vaccines. Yet, there is a limited understanding on how the lipid composition of complex, multi-component membranes, affects their biophysical behaviour.

To address this challenge, we are developing tools capable of monitoring the biophysical features of lipid bilayers with high throughput and low-cost. We will leverage on an in-house high-throughput vesicle production device, optical readouts, and AI tools to unlock an otherwise inaccessible insight into how the chemistry of complex lipid membranes dictates its biophysical properties.

This project will generate a publicly available, curated dataset to facilitate the development of data-driven biophysical models, and we anticipate its outputs will find applications in areas including antimicrobial resistance research or artificial cell development, supporting key UK strategic areas like Engineering biology.

AI-Enhanced Molecular Dynamics: Integrating Long-Range Interactions with Graph Neural Networks

Dr. Devis Di Tommaso – Queen Mary University of London
Assoc. Prof. Rachel Crespo-Otero – University College London
Prof. Greg Slabaugh – Queen Mary University of London

Project Summary:

Molecular Dynamics (MD) is an essential computational tool for studying atomistic-level phenomena. However, methods like ab initio MD, which rely on density functional theory (DFT) to compute energies and forces, are computationally expensive. On the other hand, methods based on classical interatomic potentials (IP) offer speed but lack flexibility. In recent years, machine learning (ML) approaches promise DFT-quality simulations at faster speeds but often neglect long-range interactions, which are crucial for accurately describing systems such as liquids, gas-solid, liquid-solid, and biomolecular systems. This project aims to develop AI-enhanced MD methods that integrate long-range interactions for predicting both energies and forces. The outcome will be the first version of a code implementing MGT to partially or fully replace the costly and time-intensive traditional computational methods used at each timestep of MD simulations. The model will compute atomic forces, energies, and changes in atomic positions throughout the simulation, enabling a more efficient and scalable approach to studying molecular systems. This will facilitate accelerated atomistic MD simulations that account for both local and long-range interactions.

Bayesian Optimization for Accelerating Metal-Based Antibiotic Discovery

Dr. Angelo Frei – University of York
Dr. David Husbands – University of York
Dr Athi Welsh – University of York

Project Summary:

Antimicrobial resistance to current treatments poses a growing threat to global healthcare. At the same time the antibiotic development pipeline remains perilously stagnant. This project aims to accelerate the discovery of novel metal-based antibiotics by integrating machine learning (ML) with high-throughput chemical synthesis and biological evaluation. Collaborating with Atinary Technologies, we will leverage Bayesian Optimization to train ML models with our chemical libraries to predict and iteratively refine iridium(III) metalloantibiotics, maximizing antibacterial potency while minimizing toxicity.

Metal-based antibiotics offer unique structural and functional advantages over organic compounds, yet their discovery remains slow, partially due to the vast chemical space available. Traditional methods of structure-activity relationship elucidation are time-intensive and inefficient. By training a ML model on 1440 iridium(III) complexes, we will virtually screen ~400 million potential compounds from combinations of building blocks, dramatically enhancing the speed of hit identification and the hit-rate. From this virtual screen, two iridium(III) libraries will be synthesized and evaluated using an automated Opentrons system.

Overall, this project is anticipated to yield the following outcomes:

Computer Vision for Predicting the Impact of Additives in Protein Crystallisation

Prof. Bao Nguyen – University of Leeds
Dr. Briony Yorke – University of Leeds
CaiYun Ma – University of Leeds
Dr. Halina Mikolajek – Diamond Light Source

Project Summary:

The rise of new drug discovery modalities has underscored the need for efficient macromolecular crystallization, both as a key characterization technique and as a greener alternative to traditional purification methods in manufacturing. However, the weak intermolecular interactions in protein crystals often lead to instability, necessitating the use of additives that influence protein binding, ionic strength, or nucleation. While standardized crystallization screens are widely used, the underlying intermolecular interactions and nucleation/growth mechanisms remain poorly understood, with only limited systematic studies on a small subset of proteins.

This project aims to address these challenges using AI-driven computer vision to classify and extract morphological data from microscopic images of protein crystallization screens, sourced from the VMXi beamline at Diamond Light Source. The outcome will be an automated workflow for crystal characterization across diverse sources, a robust dataset of microscopic images, including negative results and AI models trained to predict optimal crystallization conditions and additives. By improving crystallization success rates, this approach advances both protein characterization and scalable purification. Furthermore, the same AI-assisted image processing techniques can be extended to other materials, broadening their impact beyond biomolecular systems.

Enabling Data-Driven Discovery and Reaction Optimisation in Porous Organic Cage Synthesis

Dr. Benjamin D Egleston – Imperial College London
Dr. Rebecca Greenaway – Imperial College London

Project Summary:

Porous Organic Cages (POCs) are a class of molecular materials with tunable micropore structures that offer significant potential in separation technologies. Recently, our lab has been implementing machine learning tools to assess the accessibility of POCs by encoding chemists’ intuition (doi.org/10.1021/acs.jcim.1c00375). However, traditional methods for synthesising and analysing these materials reaction mixtures are limited due to the complexity of self-assembly and the unintuitive nature of species formed. To address this challenge, the project will integrate robotic liquid handling and parallel synthesis with automated data processing and analysis to enable generation of large experimental datasets – providing structural information and thermodynamic data for these complex systems for machine learning or data-driven applications.

Building on recent progress in automated high-throughput screening for combinatorial synthesis of metal-organic cages (doi.org/10.26434/chemrxiv-2024-hl427-v4) and POCs (doi.org/10.1039/D3SC06133G), the project extends these methodologies to even more complex systems. The project will be centred around identifying unintuitive structures and intermediates in reaction mixtures using generated large libraries of potential molecules for identifying in mass spectrometry (MS) data. This will be combined with automated kinetic sampling and analysis of MS data in parallel reactions to enable mapping of entire reaction spaces.

One key goal of this project is to demonstrate that automation of the discovery process, from reaction preparation to data interpretation, can accelerate the identification of novel materials. Generation of much greater volumes of detailed data will allowing for a deeper understanding of these complex systems. The resulting data-driven foundation will accelerate discovery of novel POCs and other structures that are challenging to predict using traditional intuition.

Exploration of defect superstructure phase diagrams in graphene with Bayesian AI

Dr. Lukas Hoermann – University of Warwick
Prof. Reinhard J. Maurer – University of Warwick
Dr. David Andrew Duncan – University of Nottingham
Dr. Alexander Saywell – University of Nottingham
Dr. Christopher Allen – University of Oxford

Project Summary:

The atom-scale design of two-dimensional materials, particularly defective graphene, shows great promise for catalysis, sensing, and energy storage. By integrating experimental growth and analysis with Bayesian-AI-enabled configuration space prediction via the SAMPLE code, we will lay the groundwork for the experimental design to efficiently explore the phase diagram of defective graphene. This project will uncover how experimental parameters—temperature and gas flux—influence the formation of defect superstructures in graphene that govern its electronic and mechanical properties.

Using SAMPLE, we will generate a comprehensive phase space of hundreds of millions of defect superstructures and efficiently predict their formation energies. This dataset will be available on the NOMAD database. We will calibrate the theoretical phase diagram using TEM and AFM images of N-defects in graphene from our collaborators David Duncan, Alexander Saywell (University of Nottingham), and Christopher Allen (Diamond Light Source, University of Oxford). By mapping the experimental structures with a tessellation code (Duncan and Saywell) and computing their formation energies with SAMPLE, we will place these structures within the phase diagram. Using Bayesian-AI, we will learn the functional dependence of the N-concentration and defect composition on the deposition temperature and gas flux during sample preparation. This will enable the prediction of defect patterns at a given deposition temperature and guide future experiments to achieve graphene layers with targeted defect superstructures. The developed approach will be broadly applicable to any defective two-dimensional material or surface, offering a versatile framework for precision surface engineering in a range of applications.

High-Throughput Data-Driven Electrolyte Design to Enable Lithium Metal Batteries

Dr. Neubi Xavier – University of Surrey
Dr. Matthias Golomb – University of Surrey

Project Summary:

Rechargeable batteries are a major part of our everyday lives and improving them further is crucial for future technology. The gold standard for the high-performance next-generation batteries is the use of lithium metal as the anode material. One of the major bottlenecks to enabling lithium metal batteries is the increased reactivity between current electrolyte formulations and lithium, leading to uncontrollable side reactions during operation and ultimately causing battery failures.

Researchers are currently focusing efforts on engineering new electrolyte formulations, leading to hundreds of scientific papers being published each week. The amount of data generated makes it impossible for a single researcher to follow all available literature and hinders the rational design of new electrolytes.

In this project, Dr. Neubi Xavier and Dr. Matthias Golomb aim to collate this vast amount of data into an accessible database that will establish clear reporting standards and serve battery scientists, computational chemists, and AI researchers as a starting point for further experimental and computational investigations. Using large language models, they aim to extract property information on lithium metal electrolytes from a wide range of available scientific literature and identify common core descriptors for high-performing candidates. In addition, they will combine high-throughput atomistic simulations and machine learning to fill gaps in the resulting database, aiming to create the most complete and standardized picture of the lithium metal-compatible electrolyte research landscape to date.

Transforming Chemistry Labs with Safe and Intuitive Human-in-the-Loop Robotic Systems

Dr. Luis Figueredo – University of Nottingham
Dr. Ayse Kucukyilmaz – University of Nottingham
Dr. Gabriella Pizzuto – University of Liverpool

Project Summary:

This project aims to transform chemistry labs through robotics and AI—overcoming adoption barriers while enhancing safety and efficiency. We’ll develop a framework that empowers chemists to intuitively teach robots experimental tasks via multimodal demonstrations, eliminating the need for programming expertise while ensuring stringent safety for seamless human-in-the-loop (HIL) operation. Our approach leverages generative AI for semantic scene understanding, grounded in model-based representations to enhance explainability and safety. This enables robots to interpret dynamic lab environments and manipulate glassware and hazardous substances. A certified safety layer ensures compliance with strict standards, advancing HIL automation in chemistry and aligning with AIchemy’s mission to foster intuitive, high-trust robotics in scientific research.

The automation of chemistry labs remains challenging due, among other reasons, to the requirements for precise and safe manipulation of hazardous substances, diverse glassware, and evolving experimental setups. Traditional robotic solutions require extensive programming expertise, limiting accessibility. Our approach leverages multimodal human demonstrations—combining kinaesthetic, visual, and haptic inputs—to develop constraint-based robotic behaviours that chemists can intuitively guide. Certified safety layers ensure secure robotic handling of hazardous liquids, enabling seamless human-robot collaboration in high-stakes lab environments.

Intended Outcomes

  • A human-in-the-loop robotic framework for intuitive chemistry task learning.
  • A certified safety architecture ensuring reliable robotic execution in human-centred labs.
  • A real-world lab demonstration showcasing precise liquid handling such as controlled pouring and liquid-liquid extractions.
  • Open-source tools and methodologies to foster adoption and collaboration.

This project lays the foundation for future fully automated chemistry labs, accelerating discovery while maintaining human oversight and safety.

X-GAMES: Crystallography with Machine Learning

Prof. Craig Butts – University of Bristol
Dr. Calvin Yiu – University of Bristol

Project Summary:

We will build a proof-of-principle generative AI tool – X-GAMES – to identify chemical structures directly from powdered samples by combining NMR spectroscopic and X-ray diffraction data. This is of significant value to pharmaceutical industry, where the chemical structure of molecules, and their packing in crystals controls their drug properties.

Existing generative AI methods are very good at creating a myriad of images or text on a generalised subject and can also be taught to create molecules that fit broad characteristics, e.g. “make me a molecule that might be drug-like”. However, generative structure determination is a much harder challenge – as it requires generating the one-and-only chemical structure that fits uniquely to a particular set of spectroscopic data. At Bristol we have developed early prototype systems capable of doing this albeit only for molecules with a few atoms – based on ‘inverting’ a neural network version of our existing IMPRESSION machine learning architecture that was designed to predict solution state NMR parameters. Essentially this prototype predicts structures from spectra, rather than spectra from structures.

The goal of this X-GAMES project is to build out this prototype so that it works for larger, more complex drug-like molecules. To achieve this, we will train X-GAMES on 10-100x larger datasets which integrate Xray diffraction data as well as the NMR data that our prototype is already designed to use.

A Computer Vision Approach to Understanding Polymer Swelling Kinetics

Alex McKissock
Host University: University of Strathclyde

Host Academic: Dr. Marc Reid

This project focused on studying gallium deformation in acidified salt solutions and analyse the movement via Kineticolor computer vision software.

Alex’s internship project successfully adapted Kineticolor computer vision software to investigate gallium metal locomotion, building on recent research in liquid metal chemistry. He developed expertise in computer vision analysis by studying published examples and producing high-quality experimental videos using prepared salt solutions. His findings revealed distinct stages of gallium deformation and showed that acidity accelerates the process, with Galinstan breaking down in acidified silver nitrate. The project demonstrated the potential of computer vision in studying dynamic materials and highlighted its relevance to soft robotics and chemistry education.

By working in a research lab, Alex gained practical skills in experimental design, column handling, and understanding the selectivity of various amine sensors. He also learned to optimise camera and lighting setups to overcome challenges posed by reflective liquid metals, and explored the versatility of Kineticolor’s analysis methods. The project reinforced key chemistry concepts such as redox reactions, electrochemistry, and precipitation, deepening his theoretical and practical understanding.

A Fast and Justifiable AI approach to characterising hydrogen bonds

Jack Gallimore
Host University: University of Liverpool

Host Academic: Dr. Olga Anosova

The aim of this project was to develop a more transparent and geometrically justified method for identifying hydrogen bonds in proteins, with a focus on detecting and classifying common secondary structures like helices. Unlike traditional tools such as DSSP, his approach sought to eliminate reliance on manually set thresholds, enabling robust and consistent analysis across all experimental structures in the Protein Data Bank.

At the start of Jack’s project, he explored various geometric features to reliably identify hydrogen bonds in protein structures, focusing on relationships between residues i and i+3, i+4, and i+5. After analysing distributions across the Protein Data Bank, he established a transparent rule based on oxygen–nitrogen distance and C–O–N angle, which showed strong predictive power for secondary structure formation. He plans to continue refining these thresholds with Dr Olga Anosova, aiming to publish the work later this year.

Jack learned to critically structure and produce research-grade work while handling noisy, large-scale data spanning 100,000 PDB files. He gained practical experience with Python libraries and efficient coding techniques, deepening his understanding of protein geometry and hydrogen bonding through hands-on analysis and visualization.

Computational Design of New Chiral Metal Halide Semiconductors Using High-Throughput and Machine Learning

Menna Shirras
Host University: Lancaster University

Host Academic: Dr. Nourdine Zibouche

This project aimed to design new chiral metal-halide semiconductors (CMHS) using machine learning techniques.This involved exploring chemical space with USPEX, compiling structural data, performing DFT calculations, applying supervised ML models to predict bandgaps, and refining promising candidates through high-level computations.

A curated library of 136 CMHS structures was compiled from literature and expanded through substitutions, with additional hypothetical structures attempted via USPEX, though limited by time and computational resources. Electronic data was generated using loose SCF calculations in Quantum ESPRESSO, and inconsistencies in literature data led to the use of a larger, cleaner dataset of 240 organometal halide perovskites for machine learning analysis. Three supervised regression models: Linear Regression, Decision Tree, and Random Forest, were implemented and tuned using Scikit-learn to predict bandgap values, with performance evaluated using MAE, MSE, and R² metrics.

The project allowed Menna to gain practical experience with Python libraries and machine learning algorithms, applying them to materials design and data analysis. She developed strong skills in data visualization and model evaluation, and now aims to expand her knowledge by exploring unsupervised ML techniques and applying her models to larger datasets.

Develop an AI data audit tool to help with data ingestion for the Physical Data Science Data collection

Oscar Robinson
Host University: University of Southampton

Host Academic: Dr. Matthew Partridge and Dr. Samantha Pearman-Kanza

The aim of this project was to perform an initial investigation into the development of an AI-driven data audit tool to streamline the ingestion of datasets into the Physical Chemistry Properties Data Collection (PChProp). The tool needed to be able to analyse submitted datasets, identify inconsistencies or missing metadata, and propose corrections or canonical mappings to align with community standards and support ingestion to the database where the collection is hosted.

Oscar’s project has produced a terminal-driven python script called SAM – the Solubility Audit Manager. This tool is a command line chatbot that makes use of a local HuggingFace LLM, heuristics, and NL parsing, to guide the user through ingestion stages to build a Solubility database building tool.

Additionally, he learnt how to write in a pythonic fashion, moving from experience with Jupyter Lab and .ipynb formats to multiple files assembled using PyCharm. Also gained experience applying local language models, natural language parsing, and regular expressions to import and process chemical data.

Exploring materials space with optimal transport

Zibo Zhou
Host University: University College London

Host Academic: Dr. Keith Butler

The project aimed to develop a systematic, data-driven framework for analogy-based materials discovery by integrating structural and compositional data using fused Gromov-Wasserstein (FGW) distance. This approach was specifically applied to estimate the spectroscopic limited maximum efficiency (SLME) of materials, enabling more informed predictions in materials science.

Zibo’s project demonstrated that the fused Gromov-Wasserstein (FGW) approach performs well in materials discovery tasks with small training sets, achieving results comparable to advanced models like ALIGNN and CrabNet. FGW’s predictive accuracy was highly dependent on the choice of reference material, making it especially useful when data are scarce but reliable predictions are needed. In contrast, ALIGNN and CrabNet showed superior performance as training set size increased, offering practical guidance for model selection based on data availability.

Through this project, Zibo gained a strong understanding of optimal transport theory and the fused Gromov-Wasserstein (FGW) distance, learning how it integrates structural and compositional data for materials prediction. He developed insight into the SLME property in photovoltaics and learned to evaluate model performance using various metrics across different training set sizes. His coding and computational skills improved significantly, enabling him to build efficient data pipelines and analyse large datasets. Additionally, he strengthened his collaboration and project management skills while working within a research group.

Human-AI Collaborative Closed-Loop Optimization Online Platform Development

Maxime Atkinson
Host University: University of Liverpool

Host Academic: Dr. Xenofon Evangelopoulos

The project focused on gaining an introduction to machine learning for closed-loop chemical discovery, with an emphasis on algorithmic benchmarking and orientation within the chemical discovery process.

Maxime developed and applied machine learning techniques, including Bayesian optimization and neural networks, to support chemical discovery tasks. He built and tested optimization algorithms using various surrogate models and acquisition functions on both real and synthetic datasets. Additionally, he implemented Physics-Informed Neural Networks (PINNs) for voltammetry modeling and produced functional codebases for predictive modeling, optimization loops, and data analysis in chemistry.

Through this project, he gained mastery of key tools including Git, VS Code, Sci-Kit Learn, BoTorch, Keras, and TensorFlow, while developing a deep understanding of Bayesian optimization, surrogate models, and acquisition functions and constraints. He learned to structure machine learning projects from data preprocessing to deployment and built skills in neural network architectures, ensemble methods, and Physics-Informed Neural Networks (PINNs) for scientific applications. Additionally, he gained insights into voltammetry’s chemistry and mathematics, as well as valuable experience in research collaboration and academic career development.

Impact of finer k-point sampling on vibrational free energy and crystal structure ranking accuracy

Leo Arogundade
Host University: University of Southampton

Host Academic: Prof. Graeme Day

Leo’s project built on his third-year dissertation by exploring how finer k-point sampling influences vibrational free energy calculations in crystal structure prediction (CSP) workflow. By improving the accuracy of energy data, the work sought to improve the reranking of predicted crystal structures and assess the tradeoff between computational cost and prediction reliability.

The project successfully extended the analysis to an additional 70 crystal landscapes, significantly broadening the dataset beyond the initial dissertation work. The results showed that 16 crystals experienced an improvement in their ranking. In contrast, 29 crystals saw a worsening of their ranks, while 25 crystals maintained their original rankings. This highlighted the nuanced role of vibrational energy in crystal stability, with polymorphs showing the most inconsistent rerankings due to their subtle energetic differences. These findings suggest that while refined calculations can enhance accuracy, they may also introduce variability, especially for polymorphs. Overall, the project indicated that vibrational energy metrics could complement existing CSP workflows to improve prediction reliability.

Leo deepened his understanding of computational chemistry and crystal structure prediction theory, gaining new skills in vibrational energy calculations, data analysis, and scientific programming. He became confident in handling independent research challenges and learned how theoretical models and practical data intersect in materials discovery. Working with advanced computational methods also strengthened his ability to evaluate model performance and manage complex workflows.

Machine Learning acceleration of metadynamic simulations of antimicrobial peptides

Alexandre Peuch
Host University: Imperial College London

Host Academic: Dr. Jarvist Frost

Alexandre’s project aimed to accelerate metadynamic simulations of antimicrobial peptides (AMPs) using machine learning techniques.

In the first part of the project, he developed a Monte Carlo algorithm based on amino acid coupling efficiencies to predict likely byproducts in peptide synthesis, guiding experimental efforts to identify active compounds in impure samples. The second part involved running 32 metadynamics simulations using both traditional and machine learning force fields (AMBER99SB-ILDN and Grappa) to study folding and dimerisation of four AMPs in polar and non-polar solvents. The results revealed consistent trends but also discrepancies—particularly in dimerisation—highlighting the importance of bonded parameters and the potential of ML-based force fields in capturing complex aggregation behaviours.

Alexandre developed a wide range of computing skills during his project, including proficiency with the UNIX command-line, virtual environments, Git, and high-performance computing. He gained hands-on experience with Monte Carlo simulations using Julia and Molecular Dynamics simulations via GROMACS, as well as applying cutting-edge machine learning techniques to study antimicrobial peptides. Notably, he trained neural networks to generate machine-learned collective variables for efficient biasing of MD simulations. These experiences have strengthened his interest in computational chemistry and confirmed his intention to pursue a PhD in the field.

Machine Learning Enabled Discovery of Point Defects Qubits for Quantum Technologies

Atshaam Ashraf
Host University: Imperial College London

Host Academic: Dr. Alex Ganose

The aim of this project was to use machine learned interatomic potentials (MLIPs) to screen defect complexes that are suitable for use as point-defect qubits. This screening will ensure hosts that exhibit low defect energies, long coherence lifetimes and the required symmetries for optically addressable qubits will be identified and put forward for further study by experimental and computational groups in quantum technologies.

Atshaam’s project involved calculating defect formation energies for CrAl neutral substitution defects in aluminium oxide phases using MLIPs and various computational tools. While MLIPs showed good agreement with DFT for primitive cells, they produced unphysical negative energies for defect structures, even after GGA+U corrections. This revealed limitations in the MLIP training data, which lacked accuracy for localised defect states. The findings highlight the need to train MLIPs on hybrid functional data to improve their reliability beyond bulk thermodynamics.

Through this project, he gained hands-on experience with key tools in computational materials chemistry, including the Materials Project API, doped and ShakeNBreak packages, and VASP for DFT calculations on Imperial’s HPC cluster. He learned how machine-learned interatomic potentials (MLIPs) can significantly reduce computational cost and accelerate research workflows. The project also introduced him to defect chemistry, deepening his understanding of material properties and their theoretical foundations. As his first independent research experience, it strengthened his Python, data analysis, and problem-solving skills, and confirmed his interest in pursuing a research career in atomistic simulations using machine learning.

Machine Learning Force Fields for CO₂ Adsorption and Reduction in Porous Solids

Dongin Kim
Host University: Imperial College London

Host Academic: Prof. Aron Walsh

Dongin’s project aimed to create a general, descriptor-driven framework to connect homogeneous and heterogeneous catalysis by aligning electronic properties. Using the d-band centre of a homogeneous Rh–phosphine complex as a benchmark, he designed Rh–P nanoparticles with matching surface electronic structures. This was achieved through a closed-loop workflow combining ML-accelerated molecular dynamics, DFT calculations, bonding analysis, and catalytic testing, all scripted for reproducibility and future application to other material systems.

The project found that the surface d-band centre deviation is the most reliable predictor of catalytic performance, outperforming geometric metrics. Using this insight, Rh₃P was identified as the optimal composition, surpassing previously favoured Rh₂P due to better electronic alignment with the homogeneous benchmark. Mechanistic analyses (Bader charge, XPS, COHP/ICOHP, PDOS) consistently showed that Rh₃P offers balanced bonding and near-optimal adsorbate interaction. Rh-rich compositions failed due to electronic phase separation and excessive Rh–Rh delocalisation. The team also delivered a reproducible ML-MD + DFT pipeline and proposed a transferable “electronic alignment” design principle for heterogeneous catalysts.

Dongin gained practical experience in applying AI-based molecular dynamics to a real-world catalytic design problem. She learned to use machine learning force fields (MACE and MatterSim) within ASE to run annealing–quenching simulations and validate structures before DFT analysis, deepening her understanding of the strengths and limitations of current ML potentials. She also developed skills in building reproducible computational workflows, integrating ML-MD, DFT, and electronic structure analysis into a coherent pipeline. This included automating nanosphere generation, MD runs, and figure preparation with statistical annotations. Overall, Aysel came to appreciate how AI-enabled simulations can accelerate catalyst discovery by bridging exploratory structure generation with rigorous quantum chemical analysis.

Multi-Objective Bayesian Optimisation Approach Towards Advancing Automated Liquid-Handling Platforms

Xiaojun Hu
Host University: Imperial College London

Host Academic: Prof. Becky Greenaway

The project aimed to apply Bayesian Optimisation (BO) to optimise solvent-handling parameters on the Opentrons OT-2 (robotic liquid-handling platform). A new evaluation metric, the Sum of Absolute Differences (SAD), was introduced to improve the accuracy and precision of solvent transfers. Building on this, the goal is to develop a fully automated closed-loop workflow that runs experiments with minimal human intervention. This approach enhances efficiency, throughput, and safety, marking a key step toward the next generation of chemistry laboratories.

Xiaojun conducted dispensing experiments with ethanol and chloroform on the Opentrons OT-2, identifying seven key design variables that influence solvent transfer accuracy. She introduced the Sum of Absolute Differences (SAD) as a new metric, which showed stronger correlation with ideal dispensing behaviour than traditional R² measures. Using Web-BO with a Latin Hypercube Sampling strategy, she successfully optimised ethanol dispensing within four iterations, while chloroform required further refinement due to its complex physical properties. In the final phase, she contributed to developing a fully automated closed-loop dispensing and weighing station, writing and testing custom protocols. This work lays the foundation for autonomous experimental workflows with minimal human intervention.

Through this project, Xiaojun learned to operate and troubleshoot the Opentrons OT-2, design fair performance metrics, and apply Bayesian Optimisation for experimental parameter tuning. She gained experience in collaborative coding workflows using Python, VS Code, and GitHub, and developed confidence in presenting results to a research group. Working closely with the Greenaway team, she learned to collaborate effectively and engage in a dynamic research environment. Under the guidance of doctoral researchers Sean Gurung and Alex Ostudin, and with technical support from Dr. Austin Morz, she gained valuable exposure to Web-BO. Xiaojun is especially grateful to Prof. Becky Greenaway for the opportunity to explore the field of computational chemistry.

NeuralBind: Enhancing chemical coverage and diversity of training data for binding-affinity predictions.

Savva Grevtsev
Host University: University of Oxford

Host Academic: Prof. Philip Biggin

The aim of this project was to train ML models (RFscore, EHIGN, AEV-PLIG) on various real-world and synthetic datasets, evaluate performance, infer what models learn and how well they generalise as well as the feasibility of synthetic data use in the field of binding affinity prediction..

Savva contributed as second author to a published paper (arXiv:2507.07882) and made significant contributions to multiple codebases. The project revealed that both synthetic and real-world datasets in molecular modelling suffer from inconsistent quality, and many commonly used benchmarks and tools are flawed. Despite advances in model architectures, performance has largely plateaued, with GNNs showing modest improvements due to better inductive biases. While synthetic data can help, stringent quality control is essential, large volumes of mediocre data offer little benefit. The findings suggest that future progress in the field will depend heavily on large-scale, high-quality data efforts, unless a new modelling paradigm emerges.

Savva gained deeper familiarity with Python packages, the Git version control system, and working with remote HPC clusters, while training and applying machine learning models, primarily random forests and graph neural networks, for binding affinity scoring. She developed skills in dataset curation and model performance evaluation, which are essential in bio/cheminformatics. The project also highlighted the widespread issue of poorly documented and unreliable codebases in the field, which he found frustrating, though noted gradual improvements. Overall, the experience broadened his technical capabilities and reinforced the importance of clean, maintainable code in computational research.

Towards the Development of Asynchronous Solvent Handling Capabilities for Automated Liquid Handling

Jessica Lai
Host University: Imperial College London

Host Academic: Prof. Becky Greenaway

This project focused on improving the accuracy of the Opentrons OT-2 robotic liquid handler when dispensing volatile organic solvents, which often drip due to pressure build-up from water-based calibration. To tackle this, a custom function introducing an additional air-gap step was developed to mitigate dripping. This function was integrated into a closed-loop Bayesian optimisation workflow, allowing automated optimisation of pipetting parameters across various solvents. The approach reduces human bias, time, and labour compared with traditional manual optimisation.

Jessica successfully developed a custom midpoint function for the Opentrons OT-2, which introduced an air-gap step between the pipetting origin and destination to reduce solvent dripping. This function was optimised for volatile and heavy solvents like DCM, Et₂O, and CHCl₃, resulting in lower standard deviations and improved linearity in dispensing accuracy. She implemented a Web-based Bayesian optimisation (Web BO) workflow to fine-tune seven key pipetting parameters, using defined ranges and iterative feedback. A fully automated closed-loop system was established, integrating the OT-2, a DOBOT robotic arm, and a weighing balance to execute and evaluate dispensing tasks autonomously. The midpoint function enhanced the precision of the optimisation process, supporting more reliable and consistent solvent handling.

Through this project, Jessica strengthened her Python programming skills through troubleshooting and developing a custom midpoint function, gaining confidence in error handling and collaborative coding. She expanded her technical knowledge by working with Linux, GitHub, and PowerShell, and learned to adapt her code within a team setting. The project introduced her to the Opentrons OT-2, where she explored its hardware and software architecture, including low-level API control. She also gained a solid understanding of Bayesian optimisation and its application in refining experimental conditions efficiently. Beyond technical skills, Jessica improved her presentation abilities and developed a deeper appreciation for consistency and teamwork in a research environment.

Transforming Undergraduate Learning. Migration of a Student Computational Drug Discovery Assignment to a Python Based Project.

Zhaohui Jiang
Host University: Manchester Metropolitan University

Host Academic: Dr. Alex Aziz

Zhaohui’s project aimed to transition a second-year computational drug discovery assignment from SCIGRESS, a proprietary tool, to an open-source Python-based workflow using Jupyter Notebook. The goal was to equip students with practical Python programming and data analysis skills while engaging them in virtual drug screening tasks. Using libraries like NumPy, Pandas, Matplotlib, and Scikit-Learn, the workflow introduces QSAR modeling, Lipinski’s Rule of Five, and IC₅₀ data analysis for HIV-1 protease compounds. He developed interactive teaching materials structured for delivery over five sessions, ensuring accessibility and reproducibility. Ultimately, the project supports modern, accessible computational chemistry education using open tools and real-world datasets.

He successfully developed a Python-based virtual drug screening pipeline, retrieving and cleaning IC₅₀ data for HIV-1 protease from ChEMBL, calculating molecular descriptors with RDKit, and applying Lipinski’s Rule of Five for initial filtering. She built and evaluated QSAR models using Scikit-Learn, finding that random forest regression outperformed linear regression in predictive accuracy. This model was then used to screen compounds from the SWEETLEAD database, identifying the top ten potential inhibitors. On the teaching side, Zhaohui created interactive Jupyter Notebook tutorials covering molecular visualization, cheminformatics, and machine learning, enhanced with ipywidgets and matplotlib. These deliverables provide a complete research-to-teaching workflow, supporting future interdisciplinary education in AI-driven drug discovery.

Through this project, he developed both technical and conceptual skills in applying Python and AI to chemistry. They learned to visualise and analyse chemical data using tools such as Py3Dmol, ChEMBL, and RDKit, exploring the relationship between molecular properties and pharmacological activity. Using Scikit-Learn, they built and compared QSAR models, gaining insight into model performance, feature selection, and data quality. Additionally, by creating interactive Jupyter Notebook tutorials, they enhanced their ability to communicate complex research processes and deepened their understanding of how AI can advance chemical research.

AI Agents for Chemistry: Designing the LLM-CDS Prototype

Meng Fang – University of Liverpool

Purpose, Aims, and Scope of the Visit:

This three-month visit to University College London, hosted by Prof. Jun Wang, will focus on designing and prototyping a Large Language Model-based Chemical Data Scientist (LLM-CDS). Building on the Agent K framework, the project aims to create an autonomous agent capable of structured reasoning across diverse chemical data and literature to support reaction prediction, property inference, and hypothesis generation. The visit will deliver a functional prototype, establish benchmarking protocols, and develop collaborative outputs including a technical paper and funding proposal. This collaboration combines expertise in AI agents and chemistry, advancing trustworthy, data-driven automation in chemical discovery.

Autonomous Optimisation of Sustainable Ligand Synthesis for Materials Discovery

George Lyall-Brookes – University of Liverpool

Purpose, Aims, and Scope of the Visit:

George’s proposed visit to the University of Cambridge, hosted by Prof. Alexei Lapkin, will focus on developing a sustainable and transferable route to polyaromatic ligand precursors using self-optimising algorithms in flow chemistry. These ligands are essential for constructing advanced materials, yet their synthesis remains a bottleneck. By applying multi-objective optimisation and autonomous closed-loop systems, George aims to create high-yielding, cost-effective, and environmentally friendly procedures. The visit will combine hands-on lab work with algorithmic development, increasing his expertise in flow chemistry and AI-driven synthesis while contributing to scalable solutions for high-throughput materials screening.

Generative AI-Guided Discovery of Polar Materials for Ferroelectric Applications

Andrij Vasylenko – University of Liverpool

Purpose, Aims, and Scope of the Visit:

This collaborative visit aims to establish a computational–experimental partnership between Andrij and Empa to accelerate the discovery of new non-centrosymmetric polar materials, particularly ferroelectrics. The project will apply PIGEN – a new in-house physics-informed generative AI model for inorganic crystal structure design to predict and prioritise promising candidates for experimental synthesis. Two short visits will align research goals, validate AI-generated structures, and refine modelling strategies toward publication and a Horizon Europe proposal. Combining AI-based prediction with Empa’s synthesis expertise, the collaboration will demonstrate the power of generative models in guiding experimental discovery of functional materials.

Integrating Machine Learning Potentials with Neutron Scattering for Advanced Materials Simulation at the European Spallation Source

Harry Richardson – University of Bristol

Purpose, Aims, and Scope of the Visits:

This visit supports the development of a machine learning-enhanced workflow for analysing neutron scattering data at the European Spallation Source (ESS). The project focuses on integrate machine learning interatomic potentials (MLIPs) into neutron scattering data workflows, enhancing automation and analytical precision. Through collaboration with the ESS Data Management and Software Centre, Harry will develop methods linking MLIP simulations to experimental scattering pipelines. It will also explore opportunities for ESS “first science” experiments validating MLIPs using Quasi Elastic Neutron Scattering (QENS) on liquid systems and refine fine-tuning strategies for disordered materials. This collaboration will strengthen the synergy between machine learning and neutron scattering, ensuring data-driven, physics-informed analysis is embedded in future ESS research infrastructure.

Understanding of mechanistic principles in gold(I)-catalysed synthesis using a top-down machine learning approach

Risnita ListyariniUniversity of Strathclyde

This two-month visit to the University of British Columbia, Risnita will collaborate with Dr. Jolene Reid to apply data-driven machine learning approaches to study asymmetric gold(I)-catalysed reactions. The project aims to uncover mechanistic principles governing enantioselectivity using top-down modelling techniques such as clusterwise linear regression, which identify key molecular interactions with minimal human bias. By developing predictive statistical models, the collaboration seeks to extend mechanistic understanding across diverse reaction types, improving catalyst design and reaction optimisation. This work will enhance Risnita’s expertise in data-driven chemical modelling while advancing the broader goal of applying AI to complex organic reaction mechanisms.

Purpose, Aims, and Scope of the Visit: