Improving protein-ligand complex generation with force field guidance
Artikel i vetenskaplig tidskrift, 2026

Generative models based on diffusion and flow matching have recently been applied to structure-based drug design, but their outputs often include unrealistic protein-ligand interactions that do not obey the laws of physics. We present an energy guidance framework that incorporates a molecular mechanics force field (MMFF94) directly into the sampling process. The method steers molecular generation toward more physically plausible and energetically stable conformations without retraining the underlying model. We evaluate this approach using two state-of-the-art architectures, SemlaFlow, a flow matching model and EDM, a diffusion model, on the PDBBind dataset. Across both models, energy guidance improves enthalpic interaction energy, improves strain energy by up to 75%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, and generates over 1000 ligands with better docking scores than native ligands. These results demonstrate that lightweight, physics-based guidance can significantly enhance generative drug design while preserving chemical validity and diversity.Scientific contributionWe introduce a novel, training-free force field guidance framework that steers ligand generation using empirical molecular mechanics (e.g., MMFF94) during diffusion or flow-based sampling-without modifying or retraining the base generative model (e.g., EDM or Semflaflow by [24]). Our method operates as a plug-in during inference time, leveraging energy feedback to generate poses with lower strain and having better predicted interactions with the protein structure.Our main contributions are as follows:Energy-based guidance without retraining: Unlike methods that require gradients from neural affinity predictors (e.g., BADGER [26]), our approach injects classical force field feedback (MMFF94) directly during the posterior sampling step.Improved docking and strain metrics: In benchmarks against unconditional EDM and Semflaflow, our guided inference yields consistently better AutoDock Vina scores and lower ligand strain energy, even after optimizing the final structures using the same force field.Compatibility and flexibility: Because the guidance module is external, it can be applied broadly to multiple generative backbones-without retraining or architecture modifications, and can be applied to arbitrary differentiable potential energy functions.Theoretical guarantee of stability. We demonstrate in Appendix B that the gradient correction step corresponds to a descent step on the energy under standard smoothness assumptions. While the full sampling update also includes model-driven (and, in the diffusion case, stochastic) components, this result formalizes how the guidance term locally biases the trajectory toward lower-energy regions and provides a principled justification for its stabilizing effect.Scientific contributionWe introduce a novel, training-free force field guidance framework that steers ligand generation using empirical molecular mechanics (e.g., MMFF94) during diffusion or flow-based sampling-without modifying or retraining the base generative model (e.g., EDM or Semflaflow by [24]). Our method operates as a plug-in during inference time, leveraging energy feedback to generate poses with lower strain and having better predicted interactions with the protein structure. Our main contributions are as follows:Energy-based guidance without retraining: Unlike methods that require gradients from neural affinity predictors (e.g., BADGER [26]), our approach injects classical force field feedback (MMFF94) directly during the posterior sampling step.Improved docking and strain metrics: In benchmarks against unconditional EDM and Semflaflow, our guided inference yields consistently better AutoDock Vina scores and lower ligand strain energy, even after optimizing the final structures using the same force field.Compatibility and flexibility: Because the guidance module is external, it can be applied broadly to multiple generative backbones-without retraining or architecture modifications, and can be applied to arbitrary differentiable potential energy functions.Theoretical guarantee of stability. We demonstrate in Appendix B that the gradient correction step corresponds to a descent step on the energy under standard smoothness assumptions. While the full sampling update also includes model-driven (and, in the diffusion case, stochastic) components, this result formalizes how the guidance term locally biases the trajectory toward lower-energy regions and provides a principled justification for its stabilizing effect.

Force fields

Structure-based drug design

Protein-ligand generation

Flow matching

Guidance

Chemoinformatics

Diffusion models

Författare

Helen Lai

AstraZeneca AB

Tingyu Wang

NVIDIA

Hassan Sirelkhatim

NVIDIA

Joe Eaton

NVIDIA

Howard Huang

NVIDIA

Brad Rees

NVIDIA

Ola Engkvist

Chalmers, Data- och informationsteknik, Data Science och AI

Göteborgs universitet

Jon Paul Janet

AstraZeneca AB

Xiaoyun Wang

NVIDIA

Alessandro Tibo

AstraZeneca AB

Journal of Cheminformatics

1758-2946 (ISSN) 17582946 (eISSN)

Vol. 18 1 55

Ämneskategorier (SSIF 2025)

Materialkemi

Bioinformatik (beräkningsbiologi)

DOI

10.1186/s13321-026-01198-2

PubMed

42069689

Mer information

Senast uppdaterat

2026-05-08