WEISS: Wasserstein efficient sampling strategy for LLMs in drug design
Artikel i vetenskaplig tidskrift, 2025

Autoregressive models have gained popularity in the field of drug design due to their capability to sample novel molecules from a vast chemical space efficiently. Sampling novel and diverse molecules in an efficient manner is a crucial aspect, as it is important for downstream tasks such as reinforcement learning to identify novel molecules with pre-defined desired properties. Existing sampling strategies like multinomial sampling and beam search often struggle with mode collapses or are computational inefficient, respectively. To address these limitations, we introduce WEISS (Wasserstein efficient sampling strategy), a framework that seamlessly enables autoregressive models to efficiently sample diverse molecules. Our approach, which draws inspiration from the Wasserstein autoencoder, is compatible with any encoder-decoder-based autoregressive model. We show that WEISS effectively mitigates mode collapsing while maintaining token sampling speed 25 times faster than beam search. Secondly, we showcase the efficacy of the proposed method for various drug design tasks such as molecular property optimization and single-step retrosynthesis prediction.

efficient sampling

autoencoders

LLMs

drug design

Författare

Riccardo Tedoldi

Universita degli Studi di Trento

AstraZeneca AB

Junyong Li

Student vid Chalmers

Ola Engkvist

Chalmers, Data- och informationsteknik, Data Science och AI

Andrea Passerini

Universita degli Studi di Trento

Annie M. Westerlund

AstraZeneca AB

Alessandro Tibo

AstraZeneca AB

MACHINE LEARNING-SCIENCE AND TECHNOLOGY

2632-2153 (eISSN)

Vol. 6 2 025048

Ämneskategorier (SSIF 2025)

Sannolikhetsteori och statistik

Läkemedel- och medicinsk processbioteknik

DOI

10.1088/2632-2153/addc33

Relaterade dataset

Data set [dataset]

URI: https:// github.com/r1cc4r2o/weiss

Mer information

Senast uppdaterat

2025-06-19