Learning and Optimizing Camera Pose
Doctoral thesis, 2024
Plenty of computer vision applications involve assessing the position and orientation, i.e. the pose, of one or several cameras, including object pose estimation, visualĀ localization, and structure-from-motion. Traditionally, such problems have often been addressed by detection, extraction, and matching of image keypoints, using handcrafted local image features such as the scale-invariant feature transform (SIFT), followed by robust fitting and / or optimization to determine the unknown camera pose(s). Learning-based models have the advantage that the they can learn from data what cues or patterns are relevant for the task, beyond the imagination of the engineer. However, compared with 2D vision tasks such as image classification and object detection, applying machine learning models to 3D vision tasks such as pose estimation has proven to be more challenging.
In this thesis, I explore pose estimation methods based on machine learning and optimization, from the aspects of quality, robustness, and efficiency. First, an efficient and powerful graph attention network model for learning structure-from-motion is presented, taking image point tracks as input. Generalization capabilities to novel scenes is then demonstrated, without costly fine-tuning of network parameters. Combined with bundle adjustment, accurate reconstructions are acquired, significantly faster than off-the-shelf incremental structure-from-motion pipelines. Second, techniques are presented for improving the equivariance properties of convolutional neural network models carrying out pose estimation, either by intentionally applying radial distortion to images to reduce perspective effects, or via a geometrically sound data augmentation scheme corresponding to camera motion. Next, the power and limitations of semidefinite relaxations of pose optimization problems are explored, notably leading to the conclusion that absolute camera pose estimation is not necessarily solvable using the considered semidefinite relaxations, since while they tend to almost always be tight in practice, counter-examples do indeed exist. Finally, a rendering-based object pose refinement method is presented, robust to partial occlusion due to its implicit nature, followed by a method for long-term visual localization, leveraging on a semantic segmentation model to increase the robustness by promoting semantic consistency of sampled point correspondences.
optimization
structure-from-motion
camera pose estimation
machine learning
Author
Lucas Brynte
Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering
Semantic Match Consistency for Long-Term Visual Localization
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 11206 LNCS(2018)p. 391-408
Paper in proceeding
Pose Proposal Critic: Robust Pose Refinement by Learning Reprojection Errors
Proceedings of the British Machine Vision Conference 2020,;(2020)
Paper in proceeding
On the Tightness of Semidefinite Relaxations for Rotation Estimation
Journal of Mathematical Imaging and Vision,;Vol. 64(2022)p. 57-67
Journal article
Rigidity Preserving Image Transformations and Equivariance in Perspective
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 13886 LNCS(2023)p. 59-76
Paper in proceeding
Machine learning is one of the most famous research fields of today, being the primary method for developing artificial intelligence systems. While many computer vision challenges have been revolutionized by machine learning in recent years, vision tasks involving 3D geometrical reasoning such as camera pose estimation have proven more challenging to learn. The thesis presents several contributions which demonstrate performance improvements for learning-based pose estimation. In addition, methods are presented which combine machine learning and optimization, and which explore the power and limitations of a certain type of optimization strategy for pose optimization known as semidefinite relaxations.
Deep Learning for 3D Recognition
Wallenberg AI, Autonomous Systems and Software Program, 2018-01-01 -- .
Areas of Advance
Information and Communication Technology
Subject Categories
Computer Vision and Robotics (Autonomous Systems)
ISBN
978-91-7905-973-6
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5439
Publisher
Chalmers
EC
Opponent: Prof. Dr. Konrad Schindler, Photogrammetry and Remote Sensing, ETH Zürich, Switzerland