LAGOM: A transformer-based chemical language model for drug metabolite prediction
Journal article, 2025

Metabolite identification studies are an essential but costly and time-consuming component of drug development. Computational methods have the potential to accelerate early-stage drug discovery, particularly with recent advances in deep learning which offer new opportunities to accelerate the process of metabolite prediction. We present LAGOM (Language-model Assisted Generation Of Metabolites), a Transformer-based approach built upon the Chemformer architecture, designed to predict likely metabolic transformations of drug candidates. Our results show that LAGOM performs competitively with, and in some cases surpasses, existing state-of-the-art metabolite prediction tools, demonstrating the potential of language-model-based architectures in chemoinformatics. By integrating diverse data sources and employing data augmentation strategies, we further improve the model's generalisation and predictive accuracy. The implementation of LAGOM is publicly available at github.com/tsofiac/LAGOM.

Language models

Artificial intelligence

Transformers

Deep learning

Drug metabolism

Drug discovery

Author

Sofia Larsson

AstraZeneca AB

University of Gothenburg

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Miranda Carlsson

Student at Chalmers

AstraZeneca AB

Richard Beckmann

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

University of Gothenburg

Filip Miljković

AstraZeneca AB

Rocio Mercado

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

University of Gothenburg

Artificial Intelligence in the Life Sciences

26673185 (eISSN)

Vol. 8 100142

Subject Categories (SSIF 2025)

Bioinformatics (Computational Biology)

DOI

10.1016/j.ailsci.2025.100142

Related datasets

LAGOM: Language-model-Assisted Generation Of Metabolites [dataset]

URI: https://github.com/tsofiac/LAGOM

More information

Latest update

9/25/2025