Understanding and Evaluating Policies for Sequential Decision-Making
Licentiate thesis, 2023
First, we focus on the evaluation process, where a target policy is evaluated using off-policy data collected under a different so-called behavior policy. This problem, commonly referred to as off-policy evaluation, is often solved with importance sampling (IS) techniques. Despite their popularity, IS-based methods suffer from high variance and are hard to diagnose. To address these issues, we propose estimating the behavior policy using prototype learning. Using the learned prototypes, we describe differences between target and behavior policies, allowing for better assessment of the IS estimates.
Next, we take a clinical direction and study the sequential treatment of patients with rheumatoid arthritis (RA). The armamentarium of disease-modifying anti-rheumatic drugs (DMARDs) for RA patients has greatly expanded over the past decades. However, it is still unclear which treatment work best for individual patients. To examine how observational data can be used to evaluate new policies, we describe the most common patterns of DMARDs in a large patient registry from the US. We find that the number of unique patterns is large, indicating a significant variation in clinical practice which can be exploited for evaluation purposes. However, additional assumptions may be required to arrive at statistically sound results.
Observational data
Sequential decision-making
Reinforcement learning
Off-policy evaluation
Rheumatoid arthritis
Author
Anton Matsson
Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI
Case-Based Off-Policy Evaluation Using Prototype Learning
Proceedings of Machine Learning Research,;Vol. 180(2022)p. 1339-1349
Paper in proceeding
Matsson A, Solomon DH, Crabtree MM, Harrison RW, Litman HJ, Johansson FD. Patterns in the sequential treatment of rheumatoid arthritis patients starting a b/tsDMARD: 10-year experience from a US-based registry.
Machine Learning for Causal Inference from Observational Data with Applications in Healthcare
Wallenberg AI, Autonomous Systems and Software Program, 2020-08-03 -- 2024-08-03.
Subject Categories
Computer and Information Science
Infrastructure
C3SE (Chalmers Centre for Computational Science and Engineering)
Publisher
Chalmers
Analysen, EDIT Building, Hörsalsvägen 11, Chalmers
Opponent: Herke van Hoof, assistant professor, AMLab, University of Amsterdam, the Netherlands