Geometric Supervision and Deep Structured Models for Image Segmentation
Doktorsavhandling, 2020

The task of semantic segmentation aims at understanding an image at a pixel level. Due to its applicability in many areas, such as autonomous vehicles, robotics and medical surgery assistance, semantic segmentation has become an essential task in image analysis. During the last few years a lot of progress have been made for image segmentation algorithms, mainly due to the introduction of deep learning methods, in particular the use of Convolutional Neural Networks (CNNs). CNNs are powerful for modeling complex connections between input and output data but have two drawbacks when it comes to semantic segmentation. Firstly, CNNs lack the ability to directly model dependent output structures, for instance, explicitly enforcing properties such as label smoothness and coherence. This drawback motivates the use of Conditional Random Fields (CRFs), applied as a post-processing step in semantic segmentation. Secondly, training CNNs requires large amounts of annotated data. For segmentation this amounts to dense, pixel-level, annotations that are very time-consuming to acquire.

This thesis summarizes the content of five papers addressing the two aforementioned drawbacks of CNNs. The first two papers present methods on how geometric 3D models can be used to improve segmentation models. The 3D models can be created with little human labour and can be used as a supervisory signal to improve the robustness of semantic segmentation and long-term visual localization methods.

The last three papers focuses on models combining CNNs and CRFs for semantic segmentation. The models consist of a CNN capable of learning complex image features coupled with a CRF capable of learning dependencies between output variables. Emphasis has been on creating models that are possible to train end-to-end, giving the CNN and the CRF a chance to learn how to interact and exploit complementary information to achieve better performance.

conditional random fields

convolutional neural networks

deep structured models

Semantic segmentation

self-supervised learning

supervised learning

semi-supervised learning

online participation
Opponent: Professor M. Pawan Kumar, Department of Engineering Science, University of Oxford

Författare

Måns Larsson

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik, Digitala bildsystem och bildanalys

A cross-season correspondence dataset for robust semantic segmentation

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,; Vol. 2019-June(2019)p. 9524-9534

Paper i proceeding

Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization

Proceedings of the IEEE International Conference on Computer Vision,; (2019)p. 31-41

Paper i proceeding

Revisiting Deep Structured Models for Pixel-Level Labeling with Gradient-Based Inference

SIAM Journal on Imaging Sciences,; Vol. 11(2018)p. 2610-2628

Artikel i vetenskaplig tidskrift

Max-margin learning of deep structured models for semantic segmentation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; (2017)p. 28-40

Paper i proceeding

Robust abdominal organ segmentation using regional convolutional neural networks

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; (2017)p. 41-52

Paper i proceeding

Understanding the content of an image is something humans excel at. If I were to ask you to describe the objects present in an image, you would in almost all cases manage that task effortlessly. However, if I ask you to state a set of rules to decide if an image contains a cat or a dog, you might have difficulties. Humans are so good at parsing and understanding visual scenes that we do not reflect on how we do it.

In Computer Vision the goal is to automatically extract meaningful information from an image, in some way automating tasks that the human visual system can do. In Computer Vision the goal is to automatically extract meaningful information from an image, in some way automating tasks that the human visual system can perform. This is applicable in many areas, for example a self-driving car can utilize cameras and computer vision to perceive and understand its surroundings. For medical applications, automatic interpretation of images can be helpful for diagnosis or surgery planning.

The topic of this thesis is image segmentation, which aims at understanding an image at a pixel level. The goal is to assign a label to each pixel, describing the object it is depicting. During the last few years, the dominating approaches used for segmentation are based on deep learning. In deep learning, large models with a lot of parameters - called neural networks - are used to produce a segmentation for an input image. For a model to produce useful results, its parameters need to be learnt using a big set of data containing pairs of images and manually created segmentations.

In this thesis, neural networks are combined with a type of statistical model called Conditional Random Field (CRF). CRFs are good at modeling dependencies within the output labels of a model, hence it can learn dependencies such as "hat pixels are likely to be above face pixels". In addition, methods that use 3D models to train segmentation methods to be more robust to seasonal changes have been developed. Robust segmentation methods are crucial for applications such as self-driving cars where the system needs to able to interpret its surrounding reliably during all seasons of the year.

Perceptron

VINNOVA, 2017-06-01 -- 2019-11-30.

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier

Datorseende och robotik (autonoma system)

ISBN

978-91-7905-294-2

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4761

Utgivare

Chalmers tekniska högskola

online participation

Online

Opponent: Professor M. Pawan Kumar, Department of Engineering Science, University of Oxford

Mer information

Senast uppdaterat

2020-05-05