ML model-building challenge for reaction yield prediction for catechol rearrangement reaction from transient flow dataset.
Overview:
Welcome to the Catechol Benchmark Hackathon competition!
In this competition, we will have multiple teams trying to prediction reaction outcomes of the rearrangement of allyl substituted catechol under different solvent and process conditions.
The data-set consists of multiple transient flow ramps, which allow us to assess the amount of starting material and products after seeing the reaction at different temperatures and residence times (i.e. how long the chemicals reacted for). We also include many data-points for binary mixtures of solvents, allowing us to treat the usually discrete solvent selection problem as a semi-continuous one.
Goal: Build a machine learning model that achieves the best predictions on the collected data, as measured by a cross-validation procedure, which will demonstrate the ability of your model to predict on unseen solvent data.
Description
More details of the data-set:
Data size and inputs
The data-set consists of 1227 data points on the allyl substituted catechol reaction, covering 24 solvents at different temperatures and residence times. The inputs of the model will consist of:
(1) A selection of two different solvents, Solvent A and Solvent B, with the corresponding amount of Solvent B in the mixture given by the percentage %B.
(2) The temperature in °C at which the reaction was carried out.
(3) The residence time of the reaction, i.e., how long the reactants were subject to the reaction conditions applied.
The outputs consist of the yield of the starting material and the two observed products. We also created a smaller data set of 656 data-points in which solvent mixtures are not considered, and only single solvent data, along with residence times and temperatures is considered.
Evaluation
Submissions will be evaluated according to a cross-validation procedure. This public notebook (https://www.kaggle.com/code/josepablofolch/catechol-benchmark-hackathon-template) shows the structure any submitted notebook must follow. In order to ensure fair participation among all competitors, the submission must have the same last three cells as in the notebook template, with the only allowed change being the line where the model is defined.
For the avoidance of doubt, the line model = MLPModel() can be replaced with a new model definition in the third to last and second to last cells, but everything else must remain the same.
Prizes
Prizes will be awarded on a per-person basis as follows:
Total Prizes Available: £2,000 (GBP)
- 1st Place – £250 per person (maximum £1000 total for a team of four)
- 2nd Place – £150 per person (maximum £600 total for a team of four)
- 3rd Place – £100 per person (maximum £400 total for a team of four)

