Reinforcement Learning: Efficient Communication and Sample Efficient Learning
Doctoral thesis, 2024

Life is full of decision-making problems where only partial information is available to the decision-maker and where the outcomes are uncertain. Whether it’s choosing a restaurant for dinner, selecting a movie on a streaming service, or conveying concepts during a lecture, the decision-maker observes only the results of their choices without knowing what would have happened if it had acted differently. Because of this, the decision-maker needs to carefully balance between using its current knowledge, to make good decisions, and exploring the unknown to gather new information that might lead to even better decisions in the future.


In this thesis, we explore several topics in reinforcement learning - a computational approach to sequential decision-making under uncertainty. The first part investigates how efficient communication emerges between reinforcement learning agents in signaling games. The support for efficient communication, in an information-theoretic sense, is an important characteristic of human languages. Our agents create artificial languages that are as efficient as human languages as well as similar to human ones. We also combine reinforcement learning with iterated learning and find that this combination accounts better for human color naming systems than what any of the models do individually.


The second part focuses on sample-efficient algorithms for multi-armed bandits. We propose Thompson sampling-based methods for regret minimization in multi-armed bandits with clustered arms. Additionally, we address finding optimal policies with fixed confidence in bandits with linear constraints. For this problem, we characterize a lower bound and illustrate how it depends on a non-convex projection onto the normal cone spanned by the constraints. We leverage these insights to derive asymptotically optimal algorithms for pure exploration in bandits with linear constraints. Finally, we apply techniques from multi-armed bandits to develop active learning strategies for ordering items based on noisy preference feedback.

Color Naming

Pure Exploration

Emergent Communication

Preference Learning.

Numeral Systems

Efficient Communication

Iterated Learning

Multi-armed Bandits

Contextual Bandits

Reinforcement Learning

Hörsalsvägen 11, 412 58 Göteborg
Opponent: Kenny Smith, School of Philosophy, Psychology and Language Sciences University of Edinburgh, Scotland

Author

Emil Carlsson

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Pure Exploration in Bandits with Linear Constraints

Proceedings of Machine Learning Research,;Vol. 238(2024)p. 334-342

Paper in proceeding

Pragmatic Reasoning in Structured Signaling Games

Proceedings of the 44th Annual Meeting of the Cognitive Science Society: Cognitive Diversity, CogSci 2022,;(2022)p. 2831-2837

Paper in proceeding

Thompson Sampling for Bandits with Clustered Arms

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence,;(2021)

Paper in proceeding

Learning Approximate and Exact Numeral Systems via Reinforcement Learning

Proceedings of the 43rd Annual Meeting of the Cognitive Science Society: Comparative Cognition: Animal Minds, CogSci 2021,;Vol. 43(2021)

Paper in proceeding

A reinforcement-learning approach to efficient communication

PLoS ONE,;Vol. 15(2020)

Journal article

Cultural evolution via iterated learning and communication explains efficient color naming systems

JOURNAL OF LANGUAGE EVOLUTION,;Vol. In Press(2024)

Journal article

Herman Bergström ̊, Emil Carlsson ̊, Devdatt Dubhashi, Fredrik D. Jo- hansson. Active preference learning for ordering items in- and out-of-sample. To appear in the Thirty-Eighth Annual Conference on Neural Information Processing Systems

Deep Reinforcement Learning: Principles and Applications in Cognitive Science and Networks (

Chalmers AI Research Centre (CHAIR), 2019-10-01 -- 2024-10-01.

Areas of Advance

Information and Communication Technology

Subject Categories

Computer and Information Science

Infrastructure

C3SE (Chalmers Centre for Computational Science and Engineering)

ISBN

978-91-8103-128-7

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5586

Publisher

Chalmers

Hörsalsvägen 11, 412 58 Göteborg

Opponent: Kenny Smith, School of Philosophy, Psychology and Language Sciences University of Edinburgh, Scotland

More information

Latest update

12/5/2024