WEISS: Wasserstein efficient sampling strategy for LLMs in drug design
Journal article, 2025

Autoregressive models have gained popularity in the field of drug design due to their capability to sample novel molecules from a vast chemical space efficiently. Sampling novel and diverse molecules in an efficient manner is a crucial aspect, as it is important for downstream tasks such as reinforcement learning to identify novel molecules with pre-defined desired properties. Existing sampling strategies like multinomial sampling and beam search often struggle with mode collapses or are computational inefficient, respectively. To address these limitations, we introduce WEISS (Wasserstein efficient sampling strategy), a framework that seamlessly enables autoregressive models to efficiently sample diverse molecules. Our approach, which draws inspiration from the Wasserstein autoencoder, is compatible with any encoder-decoder-based autoregressive model. We show that WEISS effectively mitigates mode collapsing while maintaining token sampling speed 25 times faster than beam search. Secondly, we showcase the efficacy of the proposed method for various drug design tasks such as molecular property optimization and single-step retrosynthesis prediction.

efficient sampling

autoencoders

LLMs

drug design

Author

Riccardo Tedoldi

University of Trento

AstraZeneca AB

Junyong Li

Student at Chalmers

Ola Engkvist

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Andrea Passerini

University of Trento

Annie M. Westerlund

AstraZeneca AB

Alessandro Tibo

AstraZeneca AB

MACHINE LEARNING-SCIENCE AND TECHNOLOGY

2632-2153 (eISSN)

Vol. 6 2 025048

Subject Categories (SSIF 2025)

Probability Theory and Statistics

Pharmaceutical and Medical Biotechnology

DOI

10.1088/2632-2153/addc33

Related datasets

Data set [dataset]

URI: https:// github.com/r1cc4r2o/weiss

More information

Latest update

6/19/2025