Geometry and Learning in 3D Computer Vision
Doktorsavhandling, 2025

This thesis focuses on studying and improving the accuracy, reliability, and efficiency of 3D vision pipelines. We leverage techniques from geometry, optimization, and deep machine learning, and we also try to explore and understand when it is suitable to combine them and when it is not, if the overall success of a 3D reconstruction system is a priority. In modern computer vision, deep neural networks are often utilized as black boxes, not only for perception but also for solving geometric problems. The performance is highly dependent on the amount and quality of the data, and the results can sometimes be surprisingly poor. Classic geometric models and optimization techniques in 3D vision are much better understood. While they are still preferred in many applications, the learning-based counterparts showcase an amazing improvement over traditional methods on certain challenging tasks.

The thesis is structured around three problems: (1) camera calibration, (2) rotation averaging, and (3) motion segmentation. For each of these problems, we analyze the weak points and failure modes of existing methods and propose new algorithms that leverage standard techniques from geometry and optimization or hybrid learning pipelines that aim to retain the interpretability of geometric models while benefiting from the expressivity and adaptability of deep neural networks.

Our contributions include: (i) a versatile pipeline for calibrating central cameras with various lens configurations that relies on simple techniques and solvers and proves to be very stable, (ii) a semidefinite program for anisotropic rotation averaging that leverages the readily-available uncertainties of the relative estimates and relies on a new convex relaxation, leading to improved reconstruction accuracy, (iii) a fast block-coordinate descent solver for anisotropic rotation averaging that achieves similar reconstruction accuracy while significantly reducing the runtime, (iv) robustification pipelines for anisotropic rotation averaging allowing gross outliers in the data, and (v) a metric learning approach addressing the challenging chicken-and-egg problem of motion segmentation via clustering in the space of trajectory feature representations, where inference is done in a fraction of a second.

camera calibration

rotation averaging

3D reconstruction

Computer vision

global structure from motion

minimal solvers

motion segmentation

robust optimization

trajectory clustering

HB2, Hörsalsvägen 8, Chalmers
Opponent: Docent and Associate Professor, Per-Erik Forssén, Linkoping University, Sweden

Författare

Yaroslava Lochman

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Imagine you are a civil engineer working on the architectural restoration of a historic building. You would like to obtain its 3D model to analyze the building structure and plan the work. You collect a bunch of images of the building and provide these images to a 3D reconstruction software—a system that recovers the geometric structure of the scene from a given set of captured images—and it will hopefully do the rest of the work for you. Or maybe not. Maybe the building has many symmetries that confuse the system, causing it to fail... Or, imagine that you are building a robot. You want to attach the cameras to it and program it to navigate autonomously in the environment. You want to use fewer cameras, so you choose fisheye cameras, which have a wide field of view. Again, you use similar software for visual localization, and it fails—actually, already at the calibration stage, because the software could not find a good starting point for your cameras... Wouldn't it be nice if the reconstruction systems were fully automatic and worked seamlessly? In this thesis, I aim to take a small step towards that goal. I study ways to leverage geometry, optimization, and deep machine learning to improve the accuracy, reliability, and overall performance of 3D reconstruction systems.

Ämneskategorier (SSIF 2025)

Datorgrafik och datorseende

DOI

10.63959/chalmers.dt/5766

ISBN

978-91-8103-309-0

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5766

Utgivare

Chalmers

HB2, Hörsalsvägen 8, Chalmers

Online

Opponent: Docent and Associate Professor, Per-Erik Forssén, Linkoping University, Sweden

Mer information

Senast uppdaterat

2025-11-20