Practically Significant Method Comparison Protocols for Machine Learning in Small Molecule Drug Discovery

Jeremy R. Ash; Cas Wognum; Raquel Rodríguez-Pérez; Matteo Aldeghi; Alan C. Cheng; Djork Arné Clevert; Ola Engkvist; Cheng Fang; Daniel J. Price; Jacqueline M. Hughes-Oliver; W. Patrick Walters

doi:10.1021/acs.jcim.5c01609

Practically Significant Method Comparison Protocols for Machine Learning in Small Molecule Drug Discovery
Review article, 2025

Machine Learning (ML) methods that relate molecular structure to properties are frequently proposed as in silico surrogates for expensive or time-consuming experiments. In small molecule drug discovery, such methods inform high-stakes decisions like compound synthesis and in vivo studies. This application lies at the intersection of multiple scientific disciplines. When comparing new ML methods to baseline or state-of-the-art approaches, statistically rigorous method comparison protocols and domain-appropriate performance metrics are essential to ensure replicability and ultimately the adoption of ML in small molecule drug discovery. This paper proposes a set of guidelines to incentivize rigorous and domain-appropriate techniques for method comparison tailored to small molecule property modeling. These guidelines, accompanied by annotated examples using open-source software tools, lay a foundation for robust ML benchmarking and thus the development of more impactful methods.

Author

Jeremy R. Ash

Johnson & Johnson Innovative Medicine

Cas Wognum

Recursion Pharmaceuticals

Valence Discovery

Raquel Rodríguez-Pérez

Novartis International AG

Matteo Aldeghi

Bayer AG

Alan C. Cheng

Merck & Co., Inc.

Djork Arné Clevert

Pfizer

Ola Engkvist

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

AstraZeneca AB

Other publications Research

Cheng Fang

Blueprint Medicines Corporation

Daniel J. Price

Nimbus Therapeutics

Jacqueline M. Hughes-Oliver

North Carolina State University

W. Patrick Walters

Relay Therapeutics

Journal of Chemical Information and Modeling

1549-9596 (ISSN) 1549960x (eISSN)

Vol. 65 18 9398-9411

Subject Categories (SSIF 2025)

Bioinformatics (Computational Biology)

Computer Sciences

Signal Processing

DOI

10.1021/acs.jcim.5c01609

Publication data connected to DOI

PubMed

40932128

More information

Latest update

10/1/2025

Practically Significant Method Comparison Protocols for Machine Learning in Small Molecule Drug Discovery Review article, 2025