Protein-ligand binding affinity prediction exploiting sequence constituent homology
Journal article, 2023

MOTIVATION: Molecular docking is a commonly used approach for estimating binding conformations and their resultant binding affinities. Machine learning has been successfully deployed to enhance such affinity estimations. Many methods of varying complexity have been developed making use of some or all the spatial and categorical information available in these structures. The evaluation of such methods has mainly been carried out using datasets from PDBbind. Particularly the Comparative Assessment of Scoring Functions (CASF) 2007, 2013, and 2016 datasets with dedicated test sets. This work demonstrates that only a small number of simple descriptors is necessary to efficiently estimate binding affinity for these complexes without the need to know the exact binding conformation of a ligand. RESULTS: The developed approach of using a small number of ligand and protein descriptors in conjunction with gradient boosting trees demonstrates high performance on the CASF datasets. This includes the commonly used benchmark CASF2016 where it appears to perform better than any other approach. This methodology is also useful for datasets where the spatial relationship between the ligand and protein is unknown as demonstrated using a large ChEMBL-derived dataset. AVAILABILITY AND IMPLEMENTATION: Code and data uploaded to https://github.com/abbiAR/PLBAffinity.

Author

Abbi Abdel-Rehim

University of Cambridge

Oghenejokpeme I. Orhobor

National Institute of Agricultural Botany

Lou Hang

University College London (UCL)

Hao Ni

University College London (UCL)

Alan Turing Institute

Ross King

Alan Turing Institute

University of Cambridge

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Bioinformatics

1367-4803 (ISSN) 13674811 (eISSN)

Vol. 39 8

Subject Categories (SSIF 2011)

Biochemistry and Molecular Biology

Bioinformatics (Computational Biology)

Theoretical Chemistry

DOI

10.1093/bioinformatics/btad502

PubMed

37572302

More information

Latest update

10/2/2023