Ballooning Multi-Armed Bandits
Paper i proceeding, 2020

We introduce ballooning multi-armed bandits (BL-MAB), a novel extension to the classical stochastic MAB model. In the BL-MAB model, the set of available arms grows (or balloons) over time. The regret in a BL-MAB setting is computed with respect to the best available arm at each time. We first observe that the existing stochastic MAB algorithms are not regret-optimal for the BL-MAB model. We show that if the best arm is equally likely to arrive at any time, a sub-linear regret cannot be achieved, irrespective of the arrival of other arms. We further show that if the best arm is more likely to arrive in the early rounds, one can achieve sub-linear regret. Making reasonable assumptions on the arrival distribution of the best arm in terms of the thinness of the distribution's tail, we prove that the proposed algorithm achieves sub-linear instance-independent regret. We further quantify explicit dependence of regret on the arrival distribution parameters.

Författare

Ganesh Ghalme

Indian Institute of Science

Swapnil Vilas Dhamal

Chalmers, Rymd-, geo- och miljövetenskap, Fysisk resursteori

Shweta Jain

Indian Institute of Technology

Sujit Gujar

International Institute of Information Technology

Y. Narahari

Indian Institute of Science

Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

15488403 (ISSN) 15582914 (eISSN)

1849-1851
978-1-4503-7518-4 (ISBN)

19th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2020
Auckland, New Zealand,

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier

Datavetenskap (datalogi)

Datorsystem

Mer information

Senast uppdaterat

2024-08-09