Off-Policy Evaluation with Out-of-Sample Guarantees
Journal article, 2023

We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss (aka. disutility or negative reward) and the main problem is making valid inferences about its out-of-sample loss when the past data was observed under a different and possibly unknown policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finitesample coverage guarantees about the entire loss distribution, rather than just its mean. Importantly, the method takes into account model misspecifications of the past policy – including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.

Author

Sofia Ek

Uppsala University

Dave Zachariah

Uppsala University

Fredrik Johansson

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

University of Gothenburg

Petre Stoica

Uppsala University

Transactions on Machine Learning Research

28358856 (eISSN)

Vol. 2023

Subject Categories (SSIF 2025)

Probability Theory and Statistics

More information

Latest update

11/17/2025