Equivariant deep learning with applications in computer vision

Georg Bökman

Equivariant deep learning with applications in computer vision
Doctoral thesis, 2024

We study equivariant deep learning for image data. A neural network is said to be equivariant to a transformation of its input if there is a corresponding transformation of the network output. In the first part of this thesis we cover key definitions and background theory in the hope of providing a useful reference for newcomers to the field.

Papers A and B study equivariance in the context of keypoint description. We introduce the concept of steerers, which are transformations of keypoint descriptions that correspond to specific transformations of the input images. We argue why rotation steerers appear naturally when training keypoint descriptor neural networks. Further, we propose affine steerers for arbitrary differentiable image transformations.

Paper C studies to what extent each layer of a convolutional neural network becomes close to reflection equivariant when trained on natural images. We find that the closer a convolutional neural network is to being equivariant, the closer it is to satisfying an equivariance constraint in each layer.

Convolutional neural networks are equivariant to translations of the input. In paper D, we show that image translations do not correspond to rigid motions of the camera taking the images. Instead, we propose methods to make neural networks more equivariant to rotational homographies, the only image transformations corresponding to rigid camera motions.

In paper E, we demonstrate that a state-of-the-art (semi-)dense image-matching neural network can be made close to rotation invariant without a drop in performance on upright images. This is done by replacing the backbone feature extractor neural network with one layerwise constrained to be equivariant.

EL41, EDIT-huset

Opponent: Prof. Kostas Daniilidis, University of Pennsylvania

Author

Georg Bökman

Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering

Other publications Research

Deep artificial neural networks (DNNs) have become basic tools for many tasks. Despite their broad usage, ranging from computational biology to chatbots, much is still unknown about how and when DNNs can be applied and how they work. In this thesis, we explore how to use knowledge about symmetries in the task at hand to improve the performance and understanding of DNNs applied to image data.

A DNN is a flexible computer program that can execute many different algorithms, depending on how its internal parameters are set. Given examples of input and expected output of a complicated algorithm, we can tune the parameters of a DNN to approximate the algorithm. DNNs thus enable the construction of computer programs in cases where it is difficult to write down an explicit algorithm but when it is possible to obtain example inputs and outputs, e.g., through human labelling.

Many tasks have inherent symmetries. Image classification is invariant to reflecting the image horizontally. When determining the direction a car is driving in an image, the direction should flip if the image is reflected. Informally, an equivariant algorithm respects the symmetry of the task. In this thesis, using the lens of symmetries leads to improved DNNs for image matching and an increased understanding of the inner workings of DNNs.

Subject Categories (SSIF 2011)

Computer Engineering

Signal Processing

Computer Vision and Robotics (Autonomous Systems)

ISBN

978-91-8103-102-7

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5560

Publisher

Chalmers