Case-Based Off-Policy Evaluation Using Prototype Learning

Anton Matsson; Fredrik Johansson

Case-Based Off-Policy Evaluation Using Prototype Learning
Paper i proceeding, 2022

Importance sampling (IS) is often used to perform off-policy evaluation but it is prone to several issues-especially when the behavior policy is unknown and must be estimated from data. Significant differences between target and behavior policies can result in uncertain value estimates due to, for example, high variance. Standard practices such as inspecting IS weights may be insufficient to diagnose such problems and determine for which type of inputs the policies differ in suggested actions and resulting values. To address this, we propose estimating the behavior policy for IS using prototype learning. The learned prototypes provide a condensed summary of the input-action space, which allows for describing differences between policies and assessing the support for evaluating a certain target policy. In addition, we can describe a value estimate in terms of prototypes to understand which parts of the target policy have the most impact on the estimate. We find that this provides new insights in the examination of a learned policy for sepsis management. Moreover, we study the bias resulting from restricting models to use prototypes, how bias propagates to IS weights and estimated values and how this varies with history length.

Författare

Anton Matsson

Chalmers, Data- och informationsteknik, Data Science och AI

Forskning Andra publikationer

Fredrik Johansson

Chalmers, Data- och informationsteknik, Data Science och AI

Forskning Andra publikationer

Proceedings of Machine Learning Research

26403498 (eISSN)

Vol. 180 1339-1349

38th Conference on Uncertainty in Artificial Intelligence, UAI 2022
Eindhoven, Netherlands,

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

Ämneskategorier (SSIF 2011)

Sannolikhetsteori och statistik

Mer information

Senast uppdaterat

2026-05-08

Case-Based Off-Policy Evaluation Using Prototype Learning Paper i proceeding, 2022