BelMan: An Information-Geometric Approach to Stochastic Bandits

Debabrota Basu; Pierre Senellart; Stéphane Bressan

doi:10.1007/978-3-030-46133-1_11

BelMan: An Information-Geometric Approach to Stochastic Bandits
Paper in proceeding, 2020

We propose a Bayesian information-geometric approach to the exploration–exploitation trade-off in stochastic multi-armed bandits. The uncertainty on reward generation and belief is represented using the manifold of joint distributions of rewards and beliefs. Accumulated information is summarised by the barycentre of joint distributions, the pseudobelief-reward. While the pseudobelief-reward facilitates information accumulation through exploration, another mechanism is needed to increase exploitation by gradually focusing on higher rewards, the pseudobelief-focal-reward. Our resulting algorithm, BelMan, alternates between projection of the pseudobelief-focal-reward onto belief-reward distributions to choose the arm to play, and projection of the updated belief-reward distributions onto the pseudobelief-focal-reward. We theoretically prove BelMan to be asymptotically optimal and to incur a sublinear regret growth. We instantiate BelMan to stochastic bandits with Bernoulli and exponential rewards, and to a real-life application of scheduling queueing bandits. Comparative evaluation with the state of the art shows that BelMan is not only competitive for Bernoulli bandits but in many cases also outperforms other approaches for exponential and queueing bandits.

Author

Debabrota Basu

Chalmers, Computer Science and Engineering (Chalmers), Data Science

Other publications Research

Pierre Senellart

Institut National de Recherche en Informatique et en Automatique (INRIA)

Ecole Normale Superieure (ENS)

Stéphane Bressan

National University of Singapore (NUS)

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

03029743 (ISSN) 16113349 (eISSN)

Vol. 11908 LNAI 167-183
978-303046132-4 (ISBN)

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019
Wurzburg, Germany,

Subject Categories (SSIF 2011)

Probability Theory and Statistics

Computer Science

Computer Systems

DOI

10.1007/978-3-030-46133-1_11

Publication data connected to DOI

More information

Latest update

2/26/2021

BelMan: An Information-Geometric Approach to Stochastic Bandits Paper in proceeding, 2020