Towards Reliable and Accurate Global Structure-from-Motion
Doctoral thesis, 2023

Reconstruction of objects or scenes from sparse point detections across multiple views is one of the most tackled problems in computer vision. Given the coordinates of 2D points tracked in multiple images, the problem consists of estimating the corresponding 3D points and cameras' calibrations (intrinsic and pose), and can be solved by minimizing reprojection errors using bundle adjustment. However, given bundle adjustment's nonlinear objective function and iterative nature, a good starting guess is required to converge to global minima.

Global and Incremental Structure-from-Motion methods appear as ways to provide good initializations to bundle adjustment, each with different properties. While Global Structure-from-Motion has been shown to result in more accurate reconstructions compared to Incremental Structure-from-Motion, the latter has better scalability by starting with a small subset of images and sequentially adding new views, allowing reconstruction of sequences with millions of images. Additionally, both Global and Incremental Structure-from-Motion methods rely on accurate models of the scene or object, and under noisy conditions or high model uncertainty might result in poor initializations for bundle adjustment. Recently pOSE, a class of matrix factorization methods, has been proposed as an alternative to conventional Global SfM methods. These methods use VarPro - a second-order optimization method - to minimize a linear combination of an approximation of reprojection errors and a regularization term based on an affine camera model, and have been shown to converge to global minima with a high rate even when starting from random camera calibration estimations.

This thesis aims at improving the reliability and accuracy of global SfM through different approaches. First, by studying conditions for global optimality of point set registration, a point cloud averaging method that can be used when (incomplete) 3D point clouds of the same scene in different coordinate systems are available. Second, by extending pOSE methods to different Structure-from-Motion problem instances, such as Non-Rigid SfM or radial distortion invariant SfM. Third and finally, by replacing the regularization term of pOSE methods with an exponential regularization on the projective depth of the 3D point estimations, resulting in a loss that achieves reconstructions with accuracy close to bundle adjustment.

bundle adjustment

point set registration

camera calibration

global SfM

Structure-from-Motion

radial distortion

3D reconstruction

non-rigid SfM

pOSE

matrix factorization

EA room, EDIT building
Opponent: Mathieu Salzmann

Author

José Pedro Lopes Iglesias

Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering

Global Optimality for Point Set Registration Using Semidefinite Programming

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,; Vol. 2020(2020)p. 8284-8292

Paper in proceeding

Accurate Optimization of Weighted Nuclear Norm for Non-Rigid Structure from Motion

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; Vol. 12372(2020)p. 21-37

Paper in proceeding

Bilinear Parameterization for Non-Separable Singular Value Penalties

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,; (2021)p. 3896-3905

Paper in proceeding

Radial Distortion Invariant Factorization for Structure from Motion

Proceedings of the IEEE International Conference on Computer Vision,; (2021)p. 5886-5895

Paper in proceeding

Vision is typically our preferred way to understand and navigate the world we live in. It allows us to do things like estimate the position and shape of objects around us, find threats in our environment, or even appreciate beautiful artwork. In order to be able to provide
robots or computer software with similar capabilities, the research area of computer vision has been studying ways to replicate the processes happening inside our brains using concepts from mathematics, geometry, and computer science.

Recent developments in computer vision, in particular related to artificial intelligence, have given us extremely capable learning-based models that were received by society with both amazement and cautiousness.
We can now use these models to generate incredibly realistic images from text prompts, detect objects of interest in images or videos, and ask questions about particular images to chatbots that could without a doubt pass the Turing test with flying colors.
However, learning-based methods still do not outperform conventional methods in tasks that deeply rely on exact geometric relations such as 3D reconstruction or pose estimation from images. These two tasks are of great importance in applications such as autonomous driving or augmented and virtual reality, as they enable us to estimate the 3D models of objects or scenes as well as our position and orientation in relation to them.

In this work, I build upon conventional methods based on geometry, linear algebra, and optimization, with the goal of improving the reliability and accuracy of Structure-from-Motion, a problem that solves simultaneously 3D reconstruction and pose estimation from keypoints detected and matched across multiple images of a scene. My aim is to extend the different use cases that these methods can be applied, and with that
get us slightly closer to performance levels that could unlock the next generation of real-life applications.

Optimization Methods with Performance Guarantees for Subspace Learning

Swedish Research Council (VR) (2018-05375), 2019-01-01 -- 2022-12-31.

Infrastructure

C3SE (Chalmers Centre for Computational Science and Engineering)

Subject Categories

Robotics

Signal Processing

ISBN

978-91-7905-863-0

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5329

Publisher

Chalmers

EA room, EDIT building

Online

Opponent: Mathieu Salzmann

More information

Latest update

5/22/2023