A methodology to correctly assess the applicability domain of cell membrane permeability predictors for cyclic peptides
Artikel i vetenskaplig tidskrift, 2024

Being able to predict the cell permeability of cyclic peptides is essential for unlocking their potential as a drug modality for intracellular targets. With a wide range of studies of cell permeability but a limited number of data points, the reliability of the machine learning (ML) models to predict previously unexplored chemical spaces becomes a challenge. In this work, we systemically investigate the predictive capability of ML models from the perspective of their extrapolation to never-before-seen applicability domains, with a particular focus on the permeability task. Four predictive algorithms, namely Support-Vector Machine, Random Forest, LightGBM and XGBoost, jointly with a conformal prediction framework were employed to characterize and evaluate the applicability through uncertainty quantification. Efficiency and validity of the models' predictions with multiple calibration strategies were assessed with respect to several external datasets from different parts of the chemical space through a set of experiments. The experiments showed that the predictors generalizing well to the applicability domain defined by the training data, can fail to achieve similar model performance on other parts of the chemical spaces. Our study proposes an approach to overcome such limitations by the means of improving the efficiency of models without sacrificing the validity. The trade-off between the reliability and informativeness was balanced when the models were calibrated with a subset of the data from the new targeted domain. This study outlines an approach to enable the extrapolation of predictive power and restore the models' reliability via a recalibration strategy without the need for retraining the underlying model.

Författare

Gökçe Geylan

Chalmers, Life sciences, Systembiologi

AstraZeneca AB

Leonardo De Maria

AstraZeneca AB

Ola Engkvist

AstraZeneca AB

Chalmers, Data- och informationsteknik

Florian David

Chalmers, Life sciences, Systembiologi

Ulf Norinder

Örebro universitet

Uppsala universitet

Stockholms universitet

Digital Discovery

2635098X (eISSN)

Vol. In Press

Ämneskategorier

Biologiska vetenskaper

Kemi

DOI

10.1039/d4dd00056k

Mer information

Senast uppdaterat

2024-08-13