Off-Policy Evaluation with Out-of-Sample Guarantees
Artikel i vetenskaplig tidskrift, 2023

We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss (aka. disutility or negative reward) and the main problem is making valid inferences about its out-of-sample loss when the past data was observed under a different and possibly unknown policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finitesample coverage guarantees about the entire loss distribution, rather than just its mean. Importantly, the method takes into account model misspecifications of the past policy – including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.

Författare

Sofia Ek

Uppsala universitet

Dave Zachariah

Uppsala universitet

Fredrik Johansson

Chalmers, Data- och informationsteknik, Data Science och AI

Göteborgs universitet

Petre Stoica

Uppsala universitet

Transactions on Machine Learning Research

28358856 (eISSN)

Vol. 2023

Ämneskategorier (SSIF 2025)

Sannolikhetsteori och statistik

Mer information

Senast uppdaterat

2025-11-17