Unequal Probability Sampling in Active Learning and Traffic Safety

Henrik Imberg

Unequal Probability Sampling in Active Learning and Traffic Safety
Licentiate thesis, 2019

This thesis addresses a problem arising in large and expensive experiments where incomplete data come in abundance but statistical analyses require collection of additional information, which is costly. Out of practical and economical considerations, it is necessary to restrict the analysis to a subset of the original database, which inevitably will cause a loss of valuable information; thus, choosing this subset in a manner that captures as much of the available information as possible is essential.

Using finite population sampling methodology, we address the issue of appropriate subset selection. We show how sample selection may be optimised to maximise precision in estimating various parameters and quantities of interest, and extend the existing finite population sampling methodology to an adaptive, sequential sampling framework, where information required for sample scheme optimisation may be updated iteratively as more data is collected. The implications of model misspecification are discussed, and the robustness of the finite population sampling methodology against model misspecification is highlighted.

The proposed methods are illustrated and evaluated on two problems: on subset selection for optimal prediction in active learning (Paper I), and on optimal control sampling for analysis of safety critical events in naturalistic driving studies (Paper II). It is demonstrated that the use of optimised sample selection may reduce the number of records for which complete information needs to be collected by as much as 50%, compared to conventional methods and uniform random sampling.

naturalistic driving

active learning

sampling weighing

optimal design

sequential sampling

probability sampling

Euler, Mathematical Sciences, Skeppsgränd 3, Gothenburg

Opponent: Associate Professor Krzysztof Bartoszek, Division of Statistics and Machine Learning, Department of Computer and Information Science, Linköping University, Linköping, Sweden

Errata

Author

Henrik Imberg

Chalmers, Mathematical Sciences, Applied Mathematics and Statistics

Other publications Research

Optimal sampling in unbiased active learning

Proceedings of Machine Learning Research,;Vol. 108(2020)p. 559-569

Paper in proceeding

Optimization of Two-Phase Sampling Designs with Application to Naturalistic Driving Studies

IEEE Transactions on Intelligent Transportation Systems,;Vol. 23(2022)p. 3575-3588

Journal article

Subject Categories (SSIF 2011)

Probability Theory and Statistics

Publisher

Chalmers