Fast Treatment Personalization with Latent Bandits in Fixed-Confidence Pure Exploration
Journal article, 2023

Personalizing treatments for patients often involves a period of trial-and-error search until an optimal choice is found. To minimize su ering and other costs, it is critical to make this process as short as possible. When treatments have primarily short-term e ects, search can be performed with multi-armed bandits (MAB), but these typically require long exploration periods to guarantee optimality. In this work, we design MAB algorithms which provably identify optimal treatments quickly by leveraging prior knowledge of the types of decision processes (patients) we can encounter, in the form of a latent variable model. We present two algorithms, the Latent LP-based Track and Stop (LLPT) explorer and the Divergence Explorer for this setting: fixed-confidence pure-exploration latent bandits. We give a lower bound on the stopping time of any algorithm which is correct at a given certainty level, and prove that the expected stopping time of the LLPT Explorer matches the lower bound in the high-certainty limit. Finally, we present results from an experimental study based on realistic simulation data for Alzheimer’s disease, demonstrating that our formulation and algorithms lead to a significantly reduced stopping time.

Author

Newton Mwai Kinyanjui

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Emil Carlsson

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Fredrik Johansson

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Transactions on Machine Learning Research

28358856 (eISSN)

Vol. 2023

Subject Categories (SSIF 2025)

Probability Theory and Statistics

More information

Latest update

3/19/2025