Optimal Subsampling Designs Under Measurement Constraints
Doctoral thesis, 2023
In this thesis we present a theory and framework for optimal design in general subsampling problems. The methodology is applicable to a wide range of settings and inference problems, including regression modelling, parametric density estimation, and finite population inference. We discuss the use of auxiliary information and sequential optimal design for the implementation of optimal subsampling methods in practice and study the asymptotic properties of the resulting estimators.
The proposed methods are illustrated and evaluated on three problem areas: on subsample selection for optimal prediction in active machine learning (Paper I), optimal control sampling in analysis of safety critical events in naturalistic driving studies (Paper II), and optimal subsampling in a scenario generation context for virtual safety assessment of an advanced driver assistance system (Paper III). In Paper IV we present a unified theory that encompasses and generalises the methods of Paper I–III and introduce a class of expected-distance-minimising designs with good theoretical and practical properties.
In Paper I–III we demonstrate a sample size reduction of 10–50% with the proposed methods compared to simple random sampling and traditional importance sampling methods, for the same level of performance. We propose a novel class of invariant linear optimality criteria, which in Paper IV are shown to reach 90–99% D-efficiency with 90–95% lower computational demand.
inverse probability weighting
unequal probability sampling
optimal design
active sampling
M-estimation
Author
Henrik Imberg
Chalmers, Mathematical Sciences, Applied Mathematics and Statistics
Optimal sampling in unbiased active learning
Proceedings of Machine Learning Research,;Vol. 108(2020)p. 559-569
Paper in proceeding
Optimization of Two-Phase Sampling Designs with Application to Naturalistic Driving Studies
IEEE Transactions on Intelligent Transportation Systems,;Vol. 23(2022)p. 3575-3588
Journal article
Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples
Technometrics,;(2024)
Journal article
Supporting the interaction of Humans and Automated vehicles: Preparing for the Environment of Tomorrow (Shape-IT)
European Commission (EC) (EC/H2020/860410), 2019-10-01 -- 2023-09-30.
Statistical methods to assess driving behaviour and causation of accidents from large naturalistic driving studies
Swedish Research Council (VR) (2012-5995), 2012-01-01 -- 2015-12-31.
Statistical sampling in machine learning
Stiftelsen Wilhelm och Martina Lundgrens Vetenskapsfond (2019-3132), 2019-05-01 -- 2019-12-31.
Stiftelsen Wilhelm och Martina Lundgrens Vetenskapsfond (2020-3446), 2020-05-01 -- 2020-12-31.
Improved quantitative driver behavior models and safety assessment methods for ADAS and AD (QUADRIS)
VINNOVA (2020-05156), 2021-04-01 -- 2024-03-31.
Roots
Basic sciences
Subject Categories
Probability Theory and Statistics
ISBN
978-91-7905-826-5
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5292
Publisher
Chalmers
Pascal
Opponent: Frank Miller, Linköpings universitet och Stockholms universitet, Sverige