Outlier Detection as a Safety Measure for Safety Critical Deep Learning
Doktorsavhandling, 2023

Context: Deep learning (DL) has proven to be a valuable component in object detection and semantic segmentation tasks, as the techniques have shown significant performance gains compared to hand-made image processing algorithms. DL refers to an optimization process where a model learns properties and parameters itself through in iterative process running on labeled data. The resulting model contains abstract features that are unintuitive to explain, thus challenging to ensure that the model will work as intended in safety critical applications (SCA).

Aim: The aim of this thesis has been to study how to connect parameters from DL with verification and testing for safety critical applications, and what extensions are necessary to verify deep neural networks. More specifically, this thesis has investigated the use of outlier detection as one testing method to detect when the model is operating on unfamiliar data.

Method: A comprehensive review of DL metrics and outlier detection metrics have been conducted. These metrics have been used to construct several new metrics to evaluate how the model behaves when encountering out-of-distribution (OOD) samples.
An evaluation framework has been constructed that performs objective evaluation of OOD detection methods. The framework has been applied on various ranges of image datasets, starting with small scale images and continuing with realistic camera based use-cases from the automotive domain.

Results: This thesis found that one major issue with deployment of DL in SCAs is quantifying and tracing performance measures. The issue exists due to the difficulty in defining requirements and test cases for DL models, and expressing the models performance in safety related metrics. While DL performance is commendable, if the performance cannot be ensured, the technique should not be deployed in SCA. Our experiments show that the effect of OOD samples can be mitigated by extending the model with safety measures, i.e., measures that reduce the impact of undesired behavior. This thesis show how to use a risk-coverage trade-off metric that connects DL performance with functional safety requirements, such that safety engineers may allocate safety requirements on DL components and evaluate their performance.

Future work: Future works recommend testing the outlier detectors on further real world scenarios and how the detector can be part of a safety argumentation.

outlier detection.


automotive perception

Chalmers Lindholmen, Jupiter building, room Omega
Opponent: Simon Burton, Fraunhofer Institute for Cognitive Systems, University of York


Jens Henriksson

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Automotive safety and machine learning: Initial results from a study on how to adapt the ISO 26262 safety standard

2018 IEEE/ACM 40th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion),; Vol. May 2018(2018)p. 47-49

Paper i proceeding

Towards Structured Evaluation of Deep Neural Network Supervisors

Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019,; Vol. 1(2019)

Paper i proceeding

Performance analysis of out-of-distribution detection on trained neural networks

Information and Software Technology,; Vol. 130(2021)

Artikel i vetenskaplig tidskrift

Understanding the Impact of Edge Cases from Occluded Pedestrians for ML Systems

Proceedings - 2021 47th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2021,; (2021)p. 316-325

Paper i proceeding

Ergo, SMIRK is safe: a safety case for a machine learning component in a pedestrian automatic emergency brake system

Software Quality Journal,; Vol. 31(2023)p. 335-403

Artikel i vetenskaplig tidskrift

Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets

2023 IEEE International Conference On Artificial Intelligence Testing (AITest),; Vol. 2023(2023)

Paper i proceeding

The field of deep learning has elevated object detection and semantic segmentation tasks to new heights compared to traditional image processing methods. While the performance gains of deep learning are rapidly being adopted in non-critical domains, its inclusion in safety critical applications remains challenging as the applications require rigorous testing before being deployed. Testing of deep learning models is an evolving area with several international standards in development with the aim to guide testing procedures. However, the recommended testing methodologies have rarely been applied to actual safety critical experiments.

This thesis has investigated outlier detection, specifically out-of-distribution detection, as one testing methodology for safety critical applications. The method shows reduced risk of misclassifications on a series of experiments with varying complexity. The thesis defines and argues for metrics that connect safety requirements with deep learning measures such that the gap between the two fields is reduced. The studies conducted in this thesis show that there is a trade-off between accepted risk in the deep neural network, and coverage of the model. This trade-off can be used by safety engineers to limit the system in such that it only operates within scenarios that are within an accepted risk level of the model.


Datorseende och robotik (autonoma system)



Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5396



Chalmers Lindholmen, Jupiter building, room Omega

Opponent: Simon Burton, Fraunhofer Institute for Cognitive Systems, University of York

Mer information

Senast uppdaterat