Prediction of Permeability and Efflux Using Multitask Learning
Artikel i vetenskaplig tidskrift, 2025
In silico prediction of cell membrane permeability is crucial in drug discovery, since a compound's permeation through membranes influences parameters such as its in vivo efficacy, bioavailability, and pharmacokinetics. This study investigates the use of multitask graph neural networks to predict a selection of permeability-related endpoints. The study utilized a harmonized, single-laboratory internal data set of over 10K compounds measured in human colorectal adenocarcinoma (Caco-2) and Madin-Darby canine kidney (MDCK) cell lines, routinely employed in experimental assays for drug permeability and efflux. This data set is an order of magnitude larger than comparable public collections, thus providing greater statistical power and a consistent error profile for model development. A series of multitask learning (MTL) models trained on such data were benchmarked against single-task approaches and evaluated on an external public data set to investigate the model's applicability domain. The comparison between the performance of single- and multitask models suggests that MTL can achieve higher accuracy by leveraging shared information across endpoints. MTL is also shown to perform better when augmented with molecular features. In particular, the inclusion of pKa and LogD, is shown to improve the accuracy of both permeability and efflux endpoints. This work presents benchmarking results of models utilizing different data splitting strategies, accompanied by guidelines for optimal validation in the context of MTL.
Permeability
Antibiotic resistance
Assays
Peptides and proteins
Cells