Understanding and Evaluating Policies for Sequential Decision-Making

Anton Matsson

Understanding and Evaluating Policies for Sequential Decision-Making
Licentiate thesis, 2023

Sequential-decision making is a critical component of many complex systems, such as finance, healthcare, and robotics. The long-term goal of a sequential decision-making process is to optimize the policy under which decisions are made. In safety-critical domains, the search for an optimal policy must be based on observational data, as new decision-making strategies need to be carefully evaluated before they can be tested in practice. In this thesis, we highlight the importance of understanding sequential decision-making at different stages of this procedure. For example, to assess which policies can be evaluated with the available data, we need to understand the policy that actually generated the data. And once we are given a policy to evaluate, we need to understand how it differs from current practice.

First, we focus on the evaluation process, where a target policy is evaluated using off-policy data collected under a different so-called behavior policy. This problem, commonly referred to as off-policy evaluation, is often solved with importance sampling (IS) techniques. Despite their popularity, IS-based methods suffer from high variance and are hard to diagnose. To address these issues, we propose estimating the behavior policy using prototype learning. Using the learned prototypes, we describe differences between target and behavior policies, allowing for better assessment of the IS estimates.

Next, we take a clinical direction and study the sequential treatment of patients with rheumatoid arthritis (RA). The armamentarium of disease-modifying anti-rheumatic drugs (DMARDs) for RA patients has greatly expanded over the past decades. However, it is still unclear which treatment work best for individual patients. To examine how observational data can be used to evaluate new policies, we describe the most common patterns of DMARDs in a large patient registry from the US. We find that the number of unique patterns is large, indicating a significant variation in clinical practice which can be exploited for evaluation purposes. However, additional assumptions may be required to arrive at statistically sound results.

Observational data

Sequential decision-making

Reinforcement learning

Off-policy evaluation

Rheumatoid arthritis

Analysen, EDIT Building, Hörsalsvägen 11, Chalmers

Opponent: Herke van Hoof, assistant professor, AMLab, University of Amsterdam, the Netherlands

Author

Anton Matsson

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Other publications Research

Case-Based Off-Policy Evaluation Using Prototype Learning

Proceedings of Machine Learning Research,;Vol. 180(2022)p. 1339-1349

Paper in proceeding

Matsson A, Solomon DH, Crabtree MM, Harrison RW, Litman HJ, Johansson FD. Patterns in the sequential treatment of rheumatoid arthritis patients starting a b/tsDMARD: 10-year experience from a US-based registry.

Machine Learning for Causal Inference from Observational Data with Applications in Healthcare

Wallenberg AI, Autonomous Systems and Software Program, 2020-08-03 -- 2024-08-03.

Show Project

Subject Categories (SSIF 2011)

Computer and Information Science

Infrastructure

C3SE (Chalmers Centre for Computational Science and Engineering)

Publisher

Chalmers