E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays
Artikel i vetenskaplig tidskrift, 2025

Abstract: Assay interference caused by small organic compounds continues to pose formidable challenges to early drug discovery. Various computational methods have been developed to identify compounds likely to cause assay interference. However, due to the scarcity of data available for model development, the predictive accuracy and applicability of these approaches are limited. In this work, we present E-GuARD, a novel framework seeking to address data scarcity and imbalance by integrating self-distillation, active learning, and expert-guided molecular generation. E-GuARD iteratively enriches the training data with interference-relevant molecules, resulting in quantitative structure-interference relationship (QSIR) models with superior performance. We demonstrate the utility of E-GuARD with the examples of four high-quality data sets on thiol reactivity, redox reactivity, nanoluciferase inhibition, and firefly luciferase inhibition. Our models reached MCC values of up to 0.47 for these data sets, with two-fold or higher improvements in enrichment factors compared to models trained without E-GuARD data augmentation. These results highlight the potential of E-GuARD as a scalable solution to mitigating assay interference in early drug discovery. Scientific contribution: We present E-GuARD, an innovative framework that combines iterative self-distillation with guided molecular augmentation to enhance the predictive performance of QSAR models. By allowing models to learn from newly generated, informative compounds through iterations, E-GuARD facilitates the understanding of underrepresented structural patterns and improves performance on unseen data. When applied across different interference mechanisms, E-GuARD consistently outperformed standard approaches. E-GuARD establishes the foundation for further research into dynamic data enrichment and more robust molecular modeling.

Författare

Vincenzo Palmacci

Universität Wien

Yasmine Nahal

AstraZeneca AB

Aalto-Yliopisto

Matthias Welsch

Universität Wien

Ola Engkvist

Chalmers, Data- och informationsteknik, Data Science och AI

AstraZeneca AB

Samuel Kaski

Aalto-Yliopisto

University of Manchester

Johannes Kirchmair

Universität Wien

Journal of Cheminformatics

1758-2946 (ISSN) 17582946 (eISSN)

Vol. 17 1 64

Ämneskategorier (SSIF 2025)

Bioinformatik (beräkningsbiologi)

Bioinformatik och beräkningsbiologi

Datavetenskap (datalogi)

DOI

10.1186/s13321-025-01014-3

PubMed

40301942

Mer information

Senast uppdaterat

2025-05-14