Locally Pareto-Optimal Interpretations for Black-Box Machine Learning Models
Paper i proceeding, 2026

Creating meaningful interpretations for black-box machine learning models involves balancing two often conflicting objectives: accuracy and explainability. Exploring the trade-off between these objectives is essential for developing trustworthy interpretations. While many techniques for multi-objective interpretation synthesis have been developed, they typically lack formal guarantees on the Pareto-optimality of the results. Methods that do provide such guarantees, on the other hand, often face severe scalability limitations when exploring the Pareto-optimal space. To address this, we develop a framework based on local optimality guarantees that enables more scalable synthesis of interpretations. Specifically, we consider the problem of synthesizing a set of Pareto-optimal interpretations with local optimality guarantees, within the immediate neighborhood of each solution. Our approach begins with a multi-objective learning or search technique, such as Multi-Objective Monte Carlo Tree Search, to generate a best-effort set of Pareto-optimal candidates with respect to accuracy and explainability. We then verify local optimality for each candidate as a Boolean satisfiability problem, which we solve using a SAT solver. We demonstrate the efficacy of our approach on a set of benchmarks, comparing it against previous methods for exploring the Pareto-optimal front of interpretations. In particular, we show that our approach yields interpretations that closely match those synthesized by methods offering global guarantees.

Machine Learning

Parametric Inference

Continuous Optimization

Machine Translation

Discrete Optimization

Logical Analysis

Författare

Aniruddha Joshi

University of California

S. Chakraborty

Indian Institute of Technology

S. Akshay

Indian Institute of Technology

Shetal Shah

Indian Institute of Technology

Hazem Torfah

Göteborgs universitet

Chalmers, Data- och informationsteknik, Formella metoder

Sanjit A. Seshia

University of California

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

03029743 (ISSN) 16113349 (eISSN)

Vol. 16145 LNCS 321-341
9783032087065 (ISBN)

23rd International Symposium on Automated Technology for Verification and Analysis, ATVA 2025
Bengaluru, India,

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

DOI

10.1007/978-3-032-08707-2_15

Mer information

Senast uppdaterat

2025-11-17