Understanding and Evaluating Policies for Sequential Decision-Making
Licentiatavhandling, 2023

Sequential-decision making is a critical component of many complex systems, such as finance, healthcare, and robotics. The long-term goal of a sequential decision-making process is to optimize the policy under which decisions are made. In safety-critical domains, the search for an optimal policy must be based on observational data, as new decision-making strategies need to be carefully evaluated before they can be tested in practice. In this thesis, we highlight the importance of understanding sequential decision-making at different stages of this procedure. For example, to assess which policies can be evaluated with the available data, we need to understand the policy that actually generated the data. And once we are given a policy to evaluate, we need to understand how it differs from current practice.

First, we focus on the evaluation process, where a target policy is evaluated using off-policy data collected under a different so-called behavior policy. This problem, commonly referred to as off-policy evaluation, is often solved with importance sampling (IS) techniques. Despite their popularity, IS-based methods suffer from high variance and are hard to diagnose. To address these issues, we propose estimating the behavior policy using prototype learning. Using the learned prototypes, we describe differences between target and behavior policies, allowing for better assessment of the IS estimates.

Next, we take a clinical direction and study the sequential treatment of patients with rheumatoid arthritis (RA). The armamentarium of disease-modifying anti-rheumatic drugs (DMARDs) for RA patients has greatly expanded over the past decades. However, it is still unclear which treatment work best for individual patients. To examine how observational data can be used to evaluate new policies, we describe the most common patterns of DMARDs in a large patient registry from the US. We find that the number of unique patterns is large, indicating a significant variation in clinical practice which can be exploited for evaluation purposes. However, additional assumptions may be required to arrive at statistically sound results.

Observational data

Sequential decision-making

Reinforcement learning

Off-policy evaluation

Rheumatoid arthritis

Analysen, EDIT Building, Hörsalsvägen 11, Chalmers
Opponent: Herke van Hoof, assistant professor, AMLab, University of Amsterdam, the Netherlands


Anton Matsson

Chalmers, Data- och informationsteknik, Data Science och AI

Case-Based Off-Policy Evaluation Using Prototype Learning

Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022), PMLR,; Vol. 180(2022)p. 1339-1349

Paper i proceeding

Matsson A, Solomon DH, Crabtree MM, Harrison RW, Litman HJ, Johansson FD. Patterns in the sequential treatment of rheumatoid arthritis patients starting a b/tsDMARD: 10-year experience from a US-based registry.

Maskininlärning för kausal inferens från observationsdata med tillämpningar inom sjukvård

Wallenberg AI, Autonomous Systems and Software Program, 2020-08-03 -- 2024-08-03.


Data- och informationsvetenskap


C3SE (Chalmers Centre for Computational Science and Engineering)



Analysen, EDIT Building, Hörsalsvägen 11, Chalmers

Opponent: Herke van Hoof, assistant professor, AMLab, University of Amsterdam, the Netherlands

Mer information

Senast uppdaterat