Geometry and Learning in 3D Computer Vision

Yaroslava Lochman

doi:10.63959/chalmers.dt/5766

Geometry and Learning in 3D Computer Vision
Doktorsavhandling, 2025

This thesis focuses on studying and improving the accuracy, reliability, and efficiency of 3D vision pipelines. We leverage techniques from geometry, optimization, and deep machine learning, and we also try to explore and understand when it is suitable to combine them and when it is not, if the overall success of a 3D reconstruction system is a priority. In modern computer vision, deep neural networks are often utilized as black boxes, not only for perception but also for solving geometric problems. The performance is highly dependent on the amount and quality of the data, and the results can sometimes be surprisingly poor. Classic geometric models and optimization techniques in 3D vision are much better understood. While they are still preferred in many applications, the learning-based counterparts showcase an amazing improvement over traditional methods on certain challenging tasks.

The thesis is structured around three problems: (1) camera calibration, (2) rotation averaging, and (3) motion segmentation. For each of these problems, we analyze the weak points and failure modes of existing methods and propose new algorithms that leverage standard techniques from geometry and optimization or hybrid learning pipelines that aim to retain the interpretability of geometric models while benefiting from the expressivity and adaptability of deep neural networks.

Our contributions include: (i) a versatile pipeline for calibrating central cameras with various lens configurations that relies on simple techniques and solvers and proves to be very stable, (ii) a semidefinite program for anisotropic rotation averaging that leverages the readily-available uncertainties of the relative estimates and relies on a new convex relaxation, leading to improved reconstruction accuracy, (iii) a fast block-coordinate descent solver for anisotropic rotation averaging that achieves similar reconstruction accuracy while significantly reducing the runtime, (iv) robustification pipelines for anisotropic rotation averaging allowing gross outliers in the data, and (v) a metric learning approach addressing the challenging chicken-and-egg problem of motion segmentation via clustering in the space of trajectory feature representations, where inference is done in a fraction of a second.

computer vision

robust optimization

minimal solvers

rotation averaging

3D reconstruction

camera calibration

global structure from motion

motion segmentation

trajectory clustering

HB2, Hörsalsvägen 8, Chalmers

Opponent: Docent and Associate Professor, Per-Erik Forssén, Linkoping University, Sweden

Online disputation

Författare

Yaroslava Lochman

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Forskning Andra publikationer

BabelCalib: A Universal Approach to Calibrating Central Cameras

Proceedings of the IEEE International Conference on Computer Vision,;(2021)p. 15233-15242

Paper i proceeding

Certifiably Optimal Anisotropic Rotation Averaging

Proceedings of the 2025 IEEE/CVF International Conference on Computer Vision,;(2025)p. 14856-14865

Paper i proceeding

Making Rotation Averaging Fast and Robust with Anisotropic Coordinate Descent

Proceedings of the 36th British Machine Vision Conference 2025,;(2025)

Paper i proceeding

Learned Trajectory Embedding for Subspace Clustering

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,;(2024)p. 19092-19102

Paper i proceeding

Imagine you are a civil engineer working on the architectural restoration of a historic building. You would like to obtain its 3D model to analyze the building structure and plan the work. You collect a bunch of images of the building and provide these images to a 3D reconstruction software—a system that recovers the geometric structure of the scene from a given set of captured images—and it will hopefully do the rest of the work for you. Or maybe not. Maybe the building has many symmetries that confuse the system, causing it to fail... Or, imagine that you are building a robot. You want to attach the cameras to it and program it to navigate autonomously in the environment. You want to use fewer cameras, so you choose fisheye cameras, which have a wide field of view. Again, you use similar software for visual localization, and it fails—actually, already at the calibration stage, because the software could not find a good starting point for your cameras... Wouldn't it be nice if the reconstruction systems were fully automatic and worked seamlessly? In this thesis, I aim to take a small step towards that goal. I study ways to leverage geometry, optimization, and deep machine learning to improve the accuracy, reliability, and overall performance of 3D reconstruction systems.

Ämneskategorier (SSIF 2025)

Datorgrafik och datorseende

DOI

10.63959/chalmers.dt/5766

Publikationsdata kopplat till DOI

ISBN

978-91-8103-309-0

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5766

Utgivare

Chalmers

HB2, Hörsalsvägen 8, Chalmers

Online

Opponent: Docent and Associate Professor, Per-Erik Forssén, Linkoping University, Sweden

Mer information

Senast uppdaterat

2025-12-02

Geometry and Learning in 3D Computer Vision Doktorsavhandling, 2025

Författare

Yaroslava Lochman

BabelCalib: A Universal Approach to Calibrating Central Cameras

Certifiably Optimal Anisotropic Rotation Averaging

Making Rotation Averaging Fast and Robust with Anisotropic Coordinate Descent

Learned Trajectory Embedding for Subspace Clustering

Ämneskategorier (SSIF 2025)

DOI

ISBN

Utgivare

Mer information

Senast uppdaterat

Geometry and Learning in 3D Computer Vision
Doktorsavhandling, 2025