Alchemy’s five forerunner projects are the initial research focus and cover a variety of different AI for Chemistry topics and challenges, applicable to both industry and academia, with partners ‘mapping’ onto projects as they develop. Future challenges facing the industrial chemical sector will be identified via our Industry Partners and translated to live research projects, facilitated through Alchemy’s funding calls.
>_F1: Human-in-the-Loop
>_F2: Large-Scale Crystal Structure Prediction
>_F3: Data-Driven Materials Discovery
>_F4: Generative AI for Small Molecules and Materials
>_F5: Identifying and Optimising Reaction Mechanisms
Combining human hypothesis and Bayesian Optimisation will allow chemists to build human intelligence and chemical knowledge into closed-loop AI-driven robotic experiments. The synergy between a human in the loop approach and robotics will enable the accelerated discovery of materials and properties.


University of Liverpool
Prof. Andy Cooper, Dr Xenofon Evangelopoulos
Research Update 1
Scientists from the Departments of Computer Science and Chemistry at the University of Liverpool have developed BORA – a “research assistant” powered by AI that helps accelerate scientific discovery. By combining the reasoning abilities of large language models (LLMs) with Bayesian optimisation, BORA can sift through complex, high-dimensional problems that usually take months of trial and error. Instead of getting stuck or wasting time, it uses literature-guided insights to suggest smarter experiments, explain its choices, and adapt on the fly. Tested on real challenges from solar energy to crop yields and new materials, BORA consistently found better solutions faster. This breakthrough shows how AI can act as a creative partner in science – not just crunching numbers but helping researchers explore uncharted territory with speed and confidence. The work of BORA has been published in the International Joint Conference of Artificial Intelligence (IJCAI), 2025 in Montreal.
Read more: https://ijcai-preprints.s3.us-west-1.amazonaws.com/2025/4998.pdf.
The prediction of material structures and properties, particularly organic materials, remains an important challenge to chemists. Here the focus is to address the limited availability of data for training ‘inverse design’ methods for organic materials and to provide a step change in the quantity of data available to describe relationships between molecular structure, crystal structure, and functional properties.
University of Southampton
Prof. Graeme Day, Prof. Jeremy Frey

Research Update 1
AIchemy researchers based at the University of Southampton have created the largest dataset of computed organic crystal structures to date, performing over one thousand large-scale crystal structure prediction (CSP) simulations. CSP enables chemists to explore enormous areas of chemical space and explore the full range of possible crystal packings for a molecule, providing insights beyond the structures observed experimentally. In this project, the team successfully reproduced over 99% of known crystal structures and correctly identified the most stable forms in the majority of cases. The resulting dataset of more than four million hypothetical structures is openly available and has already been used to train proof-of-concept machine learning models. These include a neural-network correction that improves the accuracy of energy rankings and a message-passing neural network that can re-optimise predicted structures to better match experimental data. This work demonstrates the predictive power of CSP methods when applied at scale and establishes a valuable resource for advancing computational materials discovery. The study has been reported in Faraday Discussions (DOI: 10.1039/d4fd00105b) and provides a foundation for new applications of artificial intelligence in the design and understanding of molecular crystals.
Read this study here: https://pubs.rsc.org/en/content/articlelanding/2025/fd/d4fd00105b
The literature available describing research undertaken on material classes and functions is vast and at times under utilised. This work aims to build and execute a ‘design-to-device’ pipeline for materials discovery for a chemical application via: AI-driven data sourcing, machine-learning predictions and high throughput experimental validation.

University of Cambridge
Prof. Jacqui Cole
Research Update 1
The AIchemy team at the University of Cambridge have developed machine learning models that can automatically determine molecular substructures directly from experimental nuclear magnetic resonance (NMR) spectra. Traditionally, interpreting NMR spectra requires expert analysis, which is time consuming and prone to human error. By training neural networks on more than 34,000 carbon-13 (¹³C) and 17,000 proton (¹H) spectra, the team showed that computational models can accurately correlate spectral features with molecular structure. They explored several architectures, including a multilayer perceptron combined with long short-term memory (MLP+LSTM) and convolutional neural networks (CNNs). The best-performing models achieved up to 88% accuracy for ¹³C NMR data when experimental metadata such as field strength, temperature, and solvent were included. CNN models offered a practical balance, providing high predictive accuracy while requiring only one-third of the computation time of LSTM-based models. Case studies on compounds such as aspirin, caffeine, and beta-sitosterol demonstrated that the models can correctly assign most substructures from spectra alone. This work highlights the potential of AI to accelerate chemical analysis and supports the development of automated workflows in spectroscopy and materials discovery.
Explore this work further here: https://pubs.acs.org/doi/10.1021/acs.jcim.5c00499
Chemical discovery through generative AI can identify novel molecule and materials classes with tailored compositions, structures and functions in areas of societal need. This will be achieved by using scalable generative models that harvest comprehensive chemical knowledge from large-scale multi-modal chemistry data to learn representations with desirable chemistry properties.

Imperial College London
Prof. Kim Jelfs, Dr. Alex Ganose
Research Update 1
Researchers from the Departments of Chemistry and Computer Science at Imperial College London are developing machine learning models to accelerate molecular structure elucidation from spectroscopy, with applications in drug discovery, metabolomics, and materials science.
Their approach builds on recent advances in fragment-based, spectra-conditioned generative modelling for molecular structures, and introduces two key extensions:
(1) scaling up to a large chemical space using a recently proposed multimodal spectroscopic dataset of ~790k molecule-spectra pairs, including compounds with up to 35 heavy atoms and 9 distinct chemical species.
(2) designing a hierarchical, multi-spectral encoder that combines information from IR and NMR input spectra.
In addition, they are benchmarking transformer baselines that directly translate raw spectra into molecular representations (SMILES/SELFIES), providing systematic comparisons across different methodologies. Looking ahead, our system can be fine-tuned on experimental datasets of molecule-spectra pairs as they become available and has the potential to be deployed as a web interface where users can upload spectra and obtain candidate molecular structures.
The ability and speed at which we are able predict new materials is continuously increasing. This identifies the need for new methods to accelerate the rate at which materials can move from ‘prediction’ to reality.
Here, we set out to develop an automated platform for reaction mechanism discovery employing multifidelity ML models that combine computational and spectroscopic data.
Imperial College London
Dr. Alex Ganose, Dr. Becky Greenaway, Prof. Kim Jelfs, Prof. Ruth Misener

Research Update 1
Researchers from the Departments of Chemistry and Computer Science at Imperial College London are leveraging multi-fidelity AI to accelerate the discovery of new catalytic systems and the understanding of reaction mechanisms. The team have developed a tailored data-driven pipeline focused on porous materials for catalysis, integrating high-throughput screening with reaction data curation, transfer learning and Bayesian optimisation\. By combining insights from in-house experiments with diverse external datasets, the researchers are optimising data-efficient AI models to identify improved conditions and emerging design principles for next-generation catalytic materials.