Human-in-the-loop active learning for goal-oriented molecule generation

Yasmine Nahal; Janosch Menke; Julien Martinelli; Markus Heinonen; Mikhail Kabeshov; Jon Paul Janet; Eva Nittinger; Ola Engkvist; Samuel Kaski

doi:10.1186/s13321-024-00924-y

Human-in-the-loop active learning for goal-oriented molecule generation
Artikel i vetenskaplig tidskrift, 2024

Machine learning (ML) systems have enabled the modelling of quantitative structure-property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical spaces. However, they often struggle to generalize due to the limited scope of the training data. When optimized by generative agents, this limitation can result in the generation of molecules with artificially high predicted probabilities of satisfying target properties, which subsequently fail experimental validation. To address this challenge, we propose an adaptive approach that integrates active learning (AL) and iterative feedback to refine property predictors, thereby improving the outcomes of their optimization by generative AI agents. Our method leverages the Expected Predictive Information Gain (EPIG) criterion to select additional molecules for evaluation by an oracle. This process aims to provide the greatest reduction in predictive uncertainty, enabling more accurate model evaluations of subsequently generated molecules. Recognizing the impracticality of immediate wet-lab or physics-based experiments due to time and logistical constraints, we propose leveraging human experts for their cost-effectiveness and domain knowledge to effectively augment property predictors, bridging gaps in the limited training data. Empirical evaluations through both simulated and real human-in-the-loop experiments demonstrate that our approach refines property predictors to better align with oracle assessments. Additionally, we observe improved accuracy of predicted properties as well as improved drug-likeness among the top-ranking generated molecules. Scientific contribution. We present an adaptable framework that integrates AL and human expertise to refine property predictors for goal-oriented molecule generation. This approach is robust to noise in human feedback and ensures that navigating chemical space with human-refined predictors leverages human insights to identify molecules that not only satisfy predicted property profiles but also score highly on oracle models. Additionally, it prioritizes practical characteristics such as drug-likeness, synthetic accessibility, and a favorable balance between exploring diverse chemical space and exploiting similarity to existing training data.

Active learning

Goal-oriented molecule generation

Interactive algorithms

Human-in-the-loop

Machine learning

Författare

Yasmine Nahal

Aalto-Yliopisto

AstraZeneca AB

Janosch Menke

Chalmers, Data- och informationsteknik, Data Science och AI

Forskning Andra publikationer

Julien Martinelli

Université de Bordeaux

Markus Heinonen

Aalto-Yliopisto

Mikhail Kabeshov

AstraZeneca AB

Jon Paul Janet

AstraZeneca AB

Eva Nittinger

AstraZeneca AB

Ola Engkvist

Chalmers, Data- och informationsteknik

AstraZeneca AB

Forskning Andra publikationer

Samuel Kaski

University of Manchester

Aalto-Yliopisto

Journal of Cheminformatics

1758-2946 (ISSN) 17582946 (eISSN)

Vol. 16 1 138

Ämneskategorier (SSIF 2011)

Datavetenskap (datalogi)

DOI

10.1186/s13321-024-00924-y

Publikationsdata kopplat till DOI

PubMed

39654043

Relaterade dataset

yasminenahal/hitl-al-gomg: hitl-al-gomg 1.0.5 [dataset]

URI: https://github.com/yasminenahal/hitl-al-gomg DOI: 10.5281/zenodo.14166168

Hämta data

Mer information

Senast uppdaterat

2025-03-13

Human-in-the-loop active learning for goal-oriented molecule generation Artikel i vetenskaplig tidskrift, 2024

Författare

Yasmine Nahal

Janosch Menke

Julien Martinelli

Markus Heinonen

Mikhail Kabeshov

Jon Paul Janet

Eva Nittinger

Ola Engkvist

Samuel Kaski

Journal of Cheminformatics

Ämneskategorier (SSIF 2011)

DOI

PubMed

Relaterade dataset

yasminenahal/hitl-al-gomg: hitl-al-gomg 1.0.5 [dataset]

Mer information

Senast uppdaterat

Human-in-the-loop active learning for goal-oriented molecule generation
Artikel i vetenskaplig tidskrift, 2024