Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data
Review article, 2023

The past decade has seen a number of impressive developmentsinpredictive chemistry and reaction informatics driven by machine learningapplications to computer-aided synthesis planning. While many of thesedevelopments have been made even with relatively small, bespoke datasets, in order to advance the role of AI in the field at scale, theremust be significant improvements in the reporting of reaction data.Currently, the majority of publicly available data is reported inan unstructured format and heavily imbalanced toward high-yieldingreactions, which influences the types of models that can be successfullytrained. In this Perspective, we analyze several data curation andsharing initiatives that have seen success in chemistry and molecularbiology. We discuss several factors that have contributed to theirsuccess and how we can take lessons from these case studies and applythem to reaction data. Finally, we spotlight the Open Reaction Databaseand summarize key actions the community can take toward making reactiondata more findable, accessible, interoperable, and reusable (FAIR),including the use of mandates from funding agencies and publishers.

Author

Rocio Mercado

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Steven M. Kearnes

Relay Therapeutics

Connor W. Coley

Massachusetts Institute of Technology (MIT)

Journal of Chemical Information and Modeling

1549-9596 (ISSN) 1549960x (eISSN)

Vol. 63 14 4253-4265

Subject Categories (SSIF 2011)

Bioinformatics and Systems Biology

DOI

10.1021/acs.jcim.3c00607

PubMed

37405398

More information

Latest update

4/11/2024