Exhaustive local chemical space exploration using a transformer model
Journal article, 2024

How many near-neighbors does a molecule have? This fundamental question in chemistry is crucial for molecular optimization problems under the similarity principle assumption. Generative models can sample molecules from a vast chemical space but lack explicit knowledge about molecular similarity. Therefore, these models need guidance from reinforcement learning to sample a relevant similar chemical space. However, they still miss a mechanism to measure the coverage of a specific region of the chemical space. To overcome these limitations, a source-target molecular transformer model, regularized via a similarity kernel function, is proposed. Trained on a largest dataset of ≥200 billion molecular pairs, the model enforces a direct relationship between generating a target molecule and its similarity to a source molecule. Results indicate that the regularization term significantly improves the correlation between generation probability and molecular similarity, enabling exhaustive exploration of molecule near-neighborhoods.

DRUG DISCOVERY

Author

Alessandro Tibo

AstraZeneca AB

Jiazhen He

AstraZeneca AB

Jon Paul Janet

AstraZeneca AB

Eva Nittinger

AstraZeneca AB

Ola Engkvist

Chalmers, Computer Science and Engineering (Chalmers)

AstraZeneca AB

Nature Communications

2041-1723 (ISSN) 20411723 (eISSN)

Vol. 15 1 7315

Subject Categories (SSIF 2011)

Chemical Sciences

DOI

10.1038/s41467-024-51672-4

PubMed

39183239

Related datasets

PubChem and ChEMBL-series processed dataset used in Exhaustive local chemical space exploration using a transformer model [dataset]

DOI: 10.5281/zenodo.12818281

More information

Latest update

3/13/2025