Network Analysis of the Organic Chemistry in Patents, Literature, and Pharmaceutical Industry
Journal article, 2025

Chemical reactions can be connected in large networks such as knowledge graphs. In this way, prior work has been able to draw meaningful conclusions about the properties and structures involved in organic chemistry reactions. However, the research has focused on public sources of organic synthesis that might lack the intricate details of the synthetic routes used in in-house drug discovery. In this work, previous analyses are expanded to also include an in-house electronic lab notebook (ELN) source, such that we can compare it to knowledge graphs that were constructed from US Patent and Trademark Office (USPTO) and Reaxys. We found that the Reaxys knowledge graph is the most interconnected and has the largest proportion of nodes belonging to the core, whereas the USPTO is much less connected and only has a small core. The ELN knowledge graph falls between these extremes in connectivity and it does not have any core. The hub molecules of ELN and USPTO are most similar, primarily represented by small, organic building blocks. We hypothesize that these differences can be attributed to the different origins of the data in the three sources. We discuss what impact this might have on synthesis prediction modelling.

graph analysis

knowledge graph

organic chemistry

chemical reactions

Author

Emma Svensson

AstraZeneca AB

Johannes Kepler University of Linz (JKU)

Emma Granqvist

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

AstraZeneca AB

Tomas Bastys

AstraZeneca AB

Christos Kannas

AstraZeneca AB

Mikhail Kabeshov

AstraZeneca AB

Samuel Genheden

AstraZeneca AB

Ola Engkvist

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

AstraZeneca AB

Thierry Kogej

AstraZeneca AB

Molecular Informatics

1868-1743 (ISSN) 1868-1751 (eISSN)

Vol. 44 7 e202500011

Subject Categories (SSIF 2025)

Organic Chemistry

DOI

10.1002/minf.202500011

PubMed

40679105

More information

Latest update

8/1/2025 6