Towards Robust Visual Localization in Challenging Conditions
Doktorsavhandling, 2020

Visual localization is a fundamental problem in computer vision, with a multitude of applications in robotics, augmented reality and structure-from-motion. The basic problem is to, based on one or more images, figure out the position and orientation of the camera which captured these images relative to some model of the environment. Current visual localization approaches typically work well when the images to be localized are captured under similar conditions compared to those captured during mapping. However, when the environment exhibits large changes in visual appearance, due to e.g. variations in weather, seasons, day-night or viewpoint, the traditional pipelines break down. The reason is that the local image features used are based on low-level pixel-intensity information, which is not invariant to these transformations: when the environment changes, this will cause a different set of keypoints to be detected, and their descriptors will be different, making the long-term visual localization problem a challenging one.

In this thesis, five papers are included, which present work towards solving the problem of long-term visual localization. Two of the articles present ideas for how semantic information may be included to aid in the localization process: one approach relies only on the semantic information for visual localization, and the other shows how the semantics can be used to detect outlier feature correspondences. The third paper considers how the output from a monocular depth-estimation network can be utilized to extract features that are less sensitive to viewpoint changes. The fourth article is a benchmark paper, where we present three new benchmark datasets aimed at evaluating localization algorithms in the context of long-term visual localization. Lastly, the fifth article considers how to perform convolutions on spherical imagery, which in the future might be applied to learning local image features for the localization problem.


Visual localization

long-term localization

autonomous vehicles

camera pose estimation

self-driving cars

Online (Zoom)
Opponent: Dr. Vladlen Koltun, Intel, USA


Carl Toft

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Long-Term Visual Localization Revisited

IEEE Transactions on Pattern Analysis and Machine Intelligence,; Vol. 44(2022)p. 2074-2088

Artikel i vetenskaplig tidskrift

Single-Image Depth Prediction Makes Feature Matching Easier

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; Vol. 12361 LNCS(2020)p. 473-492

Paper i proceeding

Semantic Match Consistency for Long-Term Visual Localization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; Vol. 11206 LNCS(2018)p. 391-408

Paper i proceeding

Long-term 3D Localization and Pose from Semantic Labellings

IEEE International Conference on Computer Vision Workshops,; (2017)p. 650-659

Paper i proceeding

C. Toft, G. Bökman, F. Kahl. Azimuthal Rotational Equivariance in Spherical CNNs

The problem of visual localization is to answer the question ''Where am I?'' based on one or more images. This is a problem which humans seem to solve almost effortlessly as we go about our daily tasks: we track our position in the world with little effort, and successfully use this information to navigate and plan the path to our destination.

However, as often seems to be the case, tasks which humans find easy and intuitive turn out to be very challenging to find a general algorithmic solution to, and the problem of visual localization turns out to be no exception.

Visual localization is a problem that would be of great practical interest if it can be solved reliably, since it allows robots to track their position with respect to a map of the world using only a camera as a sensor. Knowledge of its position in the world is crucial for later path planning and decision making.

Most current approaches to visual localization typically work well when localizing images caputered under similar conditions to those present during the map creation, but fail to perform well when localization is performed during different weather, lighting or seasons compared to the images used for map building.

This thesis presents work towards increasing the robustness of localization systems to these changes. New methods for localization are presented, as well as three benchmark datasets for evalutating how these systems generalize across day-night, weather and seasons.

Semantisk kartering & visuell navigering för smarta robotar

Stiftelsen för Strategisk forskning (SSF) (RIT15-0038), 2016-05-01 -- 2021-06-30.


C3SE (Chalmers Centre for Computational Science and Engineering)


Datorseende och robotik (autonoma system)



Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4890



Online (Zoom)


Opponent: Dr. Vladlen Koltun, Intel, USA

Mer information

Senast uppdaterat