Optimal Subsampling Designs Under Measurement Constraints
Doktorsavhandling, 2023
In this thesis we present a theory and framework for optimal design in general subsampling problems. The methodology is applicable to a wide range of settings and inference problems, including regression modelling, parametric density estimation, and finite population inference. We discuss the use of auxiliary information and sequential optimal design for the implementation of optimal subsampling methods in practice and study the asymptotic properties of the resulting estimators.
The proposed methods are illustrated and evaluated on three problem areas: on subsample selection for optimal prediction in active machine learning (Paper I), optimal control sampling in analysis of safety critical events in naturalistic driving studies (Paper II), and optimal subsampling in a scenario generation context for virtual safety assessment of an advanced driver assistance system (Paper III). In Paper IV we present a unified theory that encompasses and generalises the methods of Paper I–III and introduce a class of expected-distance-minimising designs with good theoretical and practical properties.
In Paper I–III we demonstrate a sample size reduction of 10–50% with the proposed methods compared to simple random sampling and traditional importance sampling methods, for the same level of performance. We propose a novel class of invariant linear optimality criteria, which in Paper IV are shown to reach 90–99% D-efficiency with 90–95% lower computational demand.
inverse probability weighting
unequal probability sampling
optimal design
active sampling
M-estimation
Författare
Henrik Imberg
Chalmers, Matematiska vetenskaper, Tillämpad matematik och statistik
Optimal sampling in unbiased active learning
Proceedings of Machine Learning Research,;Vol. 108(2020)p. 559-569
Paper i proceeding
Optimization of Two-Phase Sampling Designs with Application to Naturalistic Driving Studies
IEEE Transactions on Intelligent Transportation Systems,;Vol. 23(2022)p. 3575-3588
Artikel i vetenskaplig tidskrift
Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples
Technometrics,;(2024)
Artikel i vetenskaplig tidskrift
Supporting the interaction of Humans and Automated vehicles: Preparing for the Environment of Tomorrow (Shape-IT)
Europeiska kommissionen (EU) (EC/H2020/860410), 2019-10-01 -- 2023-09-30.
Statistiska metoder för tolkning av förarbeteenden och olycksorsaker i mätintensiva realistiska trafikförsök
Vetenskapsrådet (VR) (2012-5995), 2012-01-01 -- 2015-12-31.
Statistical sampling in machine learning
Stiftelsen Wilhelm och Martina Lundgrens Vetenskapsfond (2020-3446), 2020-05-01 -- 2020-12-31.
Stiftelsen Wilhelm och Martina Lundgrens Vetenskapsfond (2019-3132), 2019-05-01 -- 2019-12-31.
Improved quantitative driver behavior models and safety assessment methods for ADAS and AD (QUADRIS)
VINNOVA (2020-05156), 2021-04-01 -- 2024-03-31.
Fundament
Grundläggande vetenskaper
Ämneskategorier
Sannolikhetsteori och statistik
ISBN
978-91-7905-826-5
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5292
Utgivare
Chalmers
Pascal
Opponent: Frank Miller, Linköpings universitet och Stockholms universitet, Sverige