Multi-armed bandits in the wild: Pitfalls and strategies in online experiments
Journal article, 2019
© 2019 Context: Delivering faster value to customers with online experimentation is an emerging practice in industry. Multi-Armed Bandit (MAB) based experiments have the potential to deliver even faster results with a better allocation of resources over traditional A/B experiments. However, the incorrect use of MAB-based experiments can lead to incorrect conclusions that can potentially hurt the company's business. Objective: The objective of this study is to understand the pitfalls and restrictions of using MABs in online experiments, as well as the strategies that are used to overcome them. Method: This research uses a multiple case study method with eleven experts across five software companies and simulations to triangulate the data of some of the identified limitations. Results: This study analyzes some limitations faced by companies using MAB and discusses strategies used to overcome them. The results are summarized into practitioners’ guidelines with criteria to select an appropriated experimental design. Conclusion: MAB algorithms have the potential to deliver even faster results with a better allocation of resources over traditional A/B experiments. However, potential mistakes can occur and hinder the potential benefits of such approach. Together with the provided guidelines, we aim for this paper to be used as reference material for practitioners during the design of an online experiment.
Multi-armed bandit pitfalls