Prediction of the Chemical Context for Buchwald-Hartwig Coupling Reactions
Artikel i vetenskaplig tidskrift, 2022

We present machine learning models for predicting the chemical context for Buchwald-Hartwig coupling reactions, i. e., what chemicals to add to the reactants to give a productive reaction. Using reaction data from in-house electronic lab notebooks, we train two models: one based on single-label data and one based on multi-label data. Both models show excellent top-3 accuracy of approximately 90 %, which suggests strong predictivity. Furthermore, there seems to be an advantage of including multi-label data because the multi-label model shows higher accuracy and better sensitivity for the individual contexts than the single-label model. Although the models are performant, we also show that such models need to be re-trained periodically as there is a strong temporal characteristic to the usage of different contexts. Therefore, a model trained on historical data will decrease in usefulness with time as newer and better contexts emerge and replace older ones. We hypothesize that such significant transitions in the context-usage will likely affect any model predicting chemical contexts trained on historical data. Consequently, training context prediction models warrants careful planning of what data is used for training and how often the model needs to be re-trained.

CASP

context prediction

condition prediction

Buchwald-Hartwig coupling reactions

Författare

Samuel Genheden

AstraZeneca AB

Agnes Mårdh

Student vid Chalmers

AstraZeneca AB

Gustav Lahti

AstraZeneca AB

Student vid Chalmers

Ola Engkvist

AstraZeneca AB

Chalmers, Data- och informationsteknik

Simon Olsson

Chalmers, Data- och informationsteknik, Data Science och AI, Data Science och AI 1

Thierry Kogej

AstraZeneca AB

Molecular Informatics

1868-1743 (ISSN) 1868-1751 (eISSN)

Vol. In Press

Ämneskategorier

Annan data- och informationsvetenskap

Subatomär fysik

Sannolikhetsteori och statistik

DOI

10.1002/minf.202100294

PubMed

35122702

Mer information

Senast uppdaterat

2022-03-08