Reinforcement Learning: Efficient Communication and Sample Efficient Learning
Doctoral thesis, 2024
In this thesis, we explore several topics in reinforcement learning - a computational approach to sequential decision-making under uncertainty. The first part investigates how efficient communication emerges between reinforcement learning agents in signaling games. The support for efficient communication, in an information-theoretic sense, is an important characteristic of human languages. Our agents create artificial languages that are as efficient as human languages as well as similar to human ones. We also combine reinforcement learning with iterated learning and find that this combination accounts better for human color naming systems than what any of the models do individually.
The second part focuses on sample-efficient algorithms for multi-armed bandits. We propose Thompson sampling-based methods for regret minimization in multi-armed bandits with clustered arms. Additionally, we address finding optimal policies with fixed confidence in bandits with linear constraints. For this problem, we characterize a lower bound and illustrate how it depends on a non-convex projection onto the normal cone spanned by the constraints. We leverage these insights to derive asymptotically optimal algorithms for pure exploration in bandits with linear constraints. Finally, we apply techniques from multi-armed bandits to develop active learning strategies for ordering items based on noisy preference feedback.
Color Naming
Pure Exploration
Emergent Communication
Preference Learning.
Numeral Systems
Efficient Communication
Iterated Learning
Multi-armed Bandits
Contextual Bandits
Reinforcement Learning
Author
Emil Carlsson
Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI
Pure Exploration in Bandits with Linear Constraints
Proceedings of Machine Learning Research,;Vol. 238(2024)p. 334-342
Paper in proceeding
Pragmatic Reasoning in Structured Signaling Games
Proceedings of the 44th Annual Meeting of the Cognitive Science Society: Cognitive Diversity, CogSci 2022,;(2022)p. 2831-2837
Paper in proceeding
Thompson Sampling for Bandits with Clustered Arms
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence,;(2021)
Paper in proceeding
Learning Approximate and Exact Numeral Systems via Reinforcement Learning
Proceedings of the 43rd Annual Meeting of the Cognitive Science Society: Comparative Cognition: Animal Minds, CogSci 2021,;Vol. 43(2021)
Paper in proceeding
A reinforcement-learning approach to efficient communication
PLoS ONE,;Vol. 15(2020)
Journal article
Cultural evolution via iterated learning and communication explains efficient color naming systems
JOURNAL OF LANGUAGE EVOLUTION,;Vol. In Press(2024)
Journal article
Herman Bergström ̊, Emil Carlsson ̊, Devdatt Dubhashi, Fredrik D. Jo- hansson. Active preference learning for ordering items in- and out-of-sample. To appear in the Thirty-Eighth Annual Conference on Neural Information Processing Systems
Deep Reinforcement Learning: Principles and Applications in Cognitive Science and Networks (
Chalmers AI Research Centre (CHAIR), 2019-10-01 -- 2024-10-01.
Areas of Advance
Information and Communication Technology
Subject Categories
Computer and Information Science
Infrastructure
C3SE (Chalmers Centre for Computational Science and Engineering)
ISBN
978-91-8103-128-7
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5586
Publisher
Chalmers
Hörsalsvägen 11, 412 58 Göteborg
Opponent: Kenny Smith, School of Philosophy, Psychology and Language Sciences University of Edinburgh, Scotland