Interpretable machine learning models for predicting with missing values

Lena Stempfle

Interpretable machine learning models for predicting with missing values
Licentiatavhandling, 2023

Machine learning models are often used in situations where model inputs are missing either during training or at the time of prediction. If missing values are not handled appropriately, they can lead to increased bias or to models that are not applicable in practice without imputing the values of the unobserved variables. However, the imputation of missing values is often inadequate and difficult to interpret for complex imputation functions.

In this thesis, we focus on predictions in the presence of incomplete data at test time, using interpretable models that allow humans to understand the predictions. Interpretability is especially necessary when important decisions are at stake, such as in healthcare.
First, we investigate, the situation where variables are missing in recurrent patterns and sample sizes are small per pattern. We propose SPSM that allows coefficient sharing between a main model and pattern submodels in order to make efficient use of data and to be independent on imputation. To enable interpretability, the model can be expressed as a short description introduced by sparsity.
Then, we explore situations where missingness does not occur in patterns and suggest the sparse linear rule model MINTY that naturally trades off between interpretability and the goodness of fit while being sensitive to missing values at test time. To this end, we learn replacement variables, indicating which features in a rule can be alternatively used when the original feature was not measured, assuming some redundancy in the covariates.

Our results have shown that the proposed interpretable models can be used for prediction with missing values, without depending on imputation. We conclude that more work can be done in evaluating interpretable machine learning models in the context of missing values at test time.

missing values

Machine learning

healthcare

interpretable machine learning

Analysen, EDIT Building, Hörsalsvägen 11, Chalmers

Opponent: Gaël Varoquaux, Ph.D, Research Director, INRIA (French Computer Science National research), France

Online disputation

Författare

Lena Stempfle

Chalmers, Data- och informationsteknik, Data Science och AI

Forskning Andra publikationer

Lena Stempfle, Fredrik D. Johansson - Learning replacement variables in interpretable rule-based models

Sharing Pattern Submodels for Prediction with Missing Values

Proceedings of the AAAI Conference on Artificial Intelligence,;Vol. 37(2023)p. 9882-9890

Paper i proceeding

Predicting progression and cognitive decline in amyloid-positive patients with Alzheimer’s disease

Alzheimers Research and Therapy,;Vol. 13(2021)

Artikel i vetenskaplig tidskrift

WASP AI/MLX Forskarassistent

Wallenberg AI, Autonomous Systems and Software Program, 2019-08-01 -- 2023-08-01.

Visa projekt

Ämneskategorier (SSIF 2011)

Data- och informationsvetenskap

Infrastruktur

C3SE (Chalmers Centre for Computational Science and Engineering)

Utgivare

Chalmers