Equivariant deep learning with applications in computer vision
Doctoral thesis, 2024
Papers A and B study equivariance in the context of keypoint description. We introduce the concept of steerers, which are transformations of keypoint descriptions that correspond to specific transformations of the input images. We argue why rotation steerers appear naturally when training keypoint descriptor neural networks. Further, we propose affine steerers for arbitrary differentiable image transformations.
Paper C studies to what extent each layer of a convolutional neural network becomes close to reflection equivariant when trained on natural images. We find that the closer a convolutional neural network is to being equivariant, the closer it is to satisfying an equivariance constraint in each layer.
Convolutional neural networks are equivariant to translations of the input. In paper D, we show that image translations do not correspond to rigid motions of the camera taking the images. Instead, we propose methods to make neural networks more equivariant to rotational homographies, the only image transformations corresponding to rigid camera motions.
In paper E, we demonstrate that a state-of-the-art (semi-)dense image-matching neural network can be made close to rotation invariant without a drop in performance on upright images. This is done by replacing the backbone feature extractor neural network with one layerwise constrained to be equivariant.
Author
Georg Bökman
Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering
A DNN is a flexible computer program that can execute many different algorithms, depending on how its internal parameters are set. Given examples of input and expected output of a complicated algorithm, we can tune the parameters of a DNN to approximate the algorithm. DNNs thus enable the construction of computer programs in cases where it is difficult to write down an explicit algorithm but when it is possible to obtain example inputs and outputs, e.g., through human labelling.
Many tasks have inherent symmetries. Image classification is invariant to reflecting the image horizontally. When determining the direction a car is driving in an image, the direction should flip if the image is reflected. Informally, an equivariant algorithm respects the symmetry of the task. In this thesis, using the lens of symmetries leads to improved DNNs for image matching and an increased understanding of the inner workings of DNNs.
Subject Categories
Computer Engineering
Signal Processing
Computer Vision and Robotics (Autonomous Systems)
ISBN
978-91-8103-102-7
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5560
Publisher
Chalmers
EL41, EDIT-huset
Opponent: Prof. Kostas Daniilidis, University of Pennsylvania