Optimizing sequential decision-making under risk: Strategic allocation with switching penalties
Journal article, 2025

This paper considers the multiarmed bandit (MAB) problem augmented with a critical real-world consideration: the cost implications of switching decisions. Our work distinguishes itself by addressing the largely unexplored domain of risk-averse MAB problems compounded by switching penalties. Such scenarios are not just theoretical constructs but are reflective of numerous practical applications. Our contribution is threefold: firstly, we explore how switching costs and risk aversion influence decision-making in MAB problems. Secondly, we present novel theoretical results, including the development of the Risk-Averse Switching Index (RASI), which addresses the dual challenges of risk aversion and switching costs, demonstrating its near-optimal efficacy. This heuristic solution method is grounded in dynamic coherent risk measures, enabling a time-consistent evaluation of risk and reward. Lastly, through rigorous numerical experiments, we validate our algorithm's effectiveness and practical applicability, providing decision-makers with valuable insights and tools for navigating the multifaceted landscape of risk-averse environments with inherent switching costs.

Multiarmed bandit problem

Switching penalties

Dynamic coherent risk measures

Risk-averse decision-making

Stochastic programming

Author

Milad Malekipirbazari

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

European Journal of Operational Research

0377-2217 (ISSN)

Vol. 321 1 160-176

Subject Categories

Computer Science

DOI

10.1016/j.ejor.2024.09.023

More information

Latest update

11/14/2024