Learning and Optimizing Camera Pose

Lucas Brynte

Learning and Optimizing Camera Pose
Doktorsavhandling, 2024

Plenty of computer vision applications involve assessing the position and orientation, i.e. the pose, of one or several cameras, including object pose estimation, visual localization, and structure-from-motion. Traditionally, such problems have often been addressed by detection, extraction, and matching of image keypoints, using handcrafted local image features such as the scale-invariant feature transform (SIFT), followed by robust fitting and / or optimization to determine the unknown camera pose(s). Learning-based models have the advantage that the they can learn from data what cues or patterns are relevant for the task, beyond the imagination of the engineer. However, compared with 2D vision tasks such as image classification and object detection, applying machine learning models to 3D vision tasks such as pose estimation has proven to be more challenging.

In this thesis, I explore pose estimation methods based on machine learning and optimization, from the aspects of quality, robustness, and efficiency. First, an efficient and powerful graph attention network model for learning structure-from-motion is presented, taking image point tracks as input. Generalization capabilities to novel scenes is then demonstrated, without costly fine-tuning of network parameters. Combined with bundle adjustment, accurate reconstructions are acquired, significantly faster than off-the-shelf incremental structure-from-motion pipelines. Second, techniques are presented for improving the equivariance properties of convolutional neural network models carrying out pose estimation, either by intentionally applying radial distortion to images to reduce perspective effects, or via a geometrically sound data augmentation scheme corresponding to camera motion. Next, the power and limitations of semidefinite relaxations of pose optimization problems are explored, notably leading to the conclusion that absolute camera pose estimation is not necessarily solvable using the considered semidefinite relaxations, since while they tend to almost always be tight in practice, counter-examples do indeed exist. Finally, a rendering-based object pose refinement method is presented, robust to partial occlusion due to its implicit nature, followed by a method for long-term visual localization, leveraging on a semantic segmentation model to increase the robustness by promoting semantic consistency of sampled point correspondences.

optimization

structure-from-motion

camera pose estimation

machine learning

Opponent: Prof. Dr. Konrad Schindler, Photogrammetry and Remote Sensing, ETH Zürich, Switzerland

Författare

Lucas Brynte

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Forskning Andra publikationer

Semantic Match Consistency for Long-Term Visual Localization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 11206 LNCS(2018)p. 391-408

Paper i proceeding

Pose Proposal Critic: Robust Pose Refinement by Learning Reprojection Errors

31st British Machine Vision Conference, BMVC 2020,;(2020)

Paper i proceeding

On the Tightness of Semidefinite Relaxations for Rotation Estimation

Journal of Mathematical Imaging and Vision,;Vol. 64(2022)p. 57-67

Artikel i vetenskaplig tidskrift

Rigidity Preserving Image Transformations and Equivariance in Perspective

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 13886 LNCS(2023)p. 59-76

Paper i proceeding

Learning Structure-from-Motion with Graph Attention Networks

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,;(2024)p. 4808-4817

Paper i proceeding

Computer vision regards developing visual functions for machines and robots similar to human visual perception. If a camera is what substitutes the human eye, computer vision is what substitutes the visual cortex in the brain. Computer vision involves many things, including semantic understanding of image contents, as well as 3D tracking and mapping. A core concept of the latter is camera pose, meaning the position and orientation of a camera in 3D space. The thesis studies different methods for estimating camera pose, in particular machine learning methods, but also mathematical optimization methods. Estimating camera pose is relevant for tracking the movement of objects visible in an image, the movement of the camera itself, or estimating the relative position and orientation between multiple cameras.

Machine learning is one of the most famous research fields of today, being the primary method for developing artificial intelligence systems. While many computer vision challenges have been revolutionized by machine learning in recent years, vision tasks involving 3D geometrical reasoning such as camera pose estimation have proven more challenging to learn. The thesis presents several contributions which demonstrate performance improvements for learning-based pose estimation. In addition, methods are presented which combine machine learning and optimization, and which explore the power and limitations of a certain type of optimization strategy for pose optimization known as semidefinite relaxations.

Deep learning för 3D-igenkänning

Wallenberg AI, Autonomous Systems and Software Program, 2018-01-01 -- .

Visa projekt

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier (SSIF 2011)

Datorseende och robotik (autonoma system)

ISBN

978-91-7905-973-6

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5439

Utgivare

Chalmers

Opponent: Prof. Dr. Konrad Schindler, Photogrammetry and Remote Sensing, ETH Zürich, Switzerland

Mer information

Senast uppdaterat

2024-01-08

Learning and Optimizing Camera Pose Doktorsavhandling, 2024

Författare

Lucas Brynte

Semantic Match Consistency for Long-Term Visual Localization

Pose Proposal Critic: Robust Pose Refinement by Learning Reprojection Errors

On the Tightness of Semidefinite Relaxations for Rotation Estimation

Rigidity Preserving Image Transformations and Equivariance in Perspective

Learning Structure-from-Motion with Graph Attention Networks

Deep learning för 3D-igenkänning

Styrkeområden

Ämneskategorier (SSIF 2011)

ISBN

Utgivare

Mer information

Senast uppdaterat

Learning and Optimizing Camera Pose
Doktorsavhandling, 2024