Pensieve Perception: Uncertainty, Language, and Novel Views for Autonomous Driving
Doctoral thesis, 2025

Autonomous driving stands as one of the most complex challenges in artificial intelligence and robotics. These systems are expected to provide safe and efficient navigation across diverse and dynamic environments. This thesis addresses critical obstacles in the development of autonomous driving technologies by focusing on three primary areas: uncertainty estimation in object detection, reduction of annotation requirements through self-supervised learning, and the creation of scalable, realistic simulations via neural rendering. First, we introduce a novel framework for uncertainty estimation in object detection, leveraging the Random Finite Set formalism to provide a principled approach for training and evaluating probabilistic object detectors. This framework enables practitioners to better understand the capabilities and limitations of object detectors, and effectively design and deploy safer and more reliable systems. Second, we present LidarCLIP, a self-supervised learning method designed to bridge the gap between lidar point clouds and language understanding. By aligning lidar data with the CLIP embedding space through image-point cloud pairs, LidarCLIP learns semantic scene understanding without the need for costly human annotations. This approach has the potential to significantly reduce the dependency on labeled data, accelerating the development and deployment of autonomous driving systems. Last, we develop NeuRAD and SplatAD, advanced neural rendering techniques for joint camera and lidar simulation. NeuRAD offers a state-of-the-art neural simulator that facilitates scalable and sensor-realistic closed-loop simulations, while SplatAD enhances this capability by improving both visual quality and computational efficiency. These methods pave the way for scalable validation and testing of autonomous driving systems in diverse and rare scenarios, facilitating comprehensive safety assessments. Collectively, this thesis addresses existing challenges in autonomous driving and opens up new avenues for future research in building safe and reliable autonomous driving systems at scale.

Novel view synthesis

Self-supervised learning

Object detection

Neural Radiance Fields

Autonomous driving

Gaussian Splatting

ED, Hörsalsvägen 11
Opponent: Manmohan Chandraker, UC San Diego, USA

Author

Georg Hess

Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering

Object Detection as Probabilistic Set Prediction

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 13670 LNCS(2022)p. 550-566

Paper in proceeding

LidarCLIP or: How I Learned to Talk to Point Clouds

Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV),;(2024)p. 7423-7432

Paper in proceeding

NeuRAD: Neural Rendering for Autonomous Driving

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,;(2024)p. 14895-14904

Paper in proceeding

SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),;(2025)

Paper in proceeding

We live during a time when autonomous cars are turning from sci-fi to reality.
In several cities, it is now possible to order a completely self-driving car with your phone, and it will take you from A to B.
While this is an impressive achievement, these robotaxis still only operate in a limited domain, such as a handful of cities, and they use hardware that is not affordable enough to be put inside a consumer vehicle.
Thus, the dream of everyone having access to affordable autonomous cars remains an open challenge.

In this thesis, we explore three key aspects to accelerate the scalable development of autonomous driving systems.
First, we propose new ways to model the uncertainty of object detectors.
Object detection is an important aspect of many autonomous systems and is used, for instance, to understand what other surrounding traffic participants are present and where they are.
Modeling the uncertainty is useful for understanding the limitations of detectors and for making informed decisions.
Second, training these autonomous systems usually requires vast amounts of labeled data, where humans mark objects and scenarios, a process both slow and expensive.
Multiple methods have explored how we can move away from explicit labels and instead use web-scale image-caption pairs to learn meaningful representations from natural language.
We extend this approach to lidar sensors, an important sensor in autonomous driving that can directly measure the distance to surrounding surfaces, without needing any manual annotations.
Our method enables us to use language to directly interact with 3D data and has the potential to reduce the reliance on human-provided annotations and speed up the development process.
Last, testing self-driving cars in the real world to cover every possible scenario is impractical and risky.
To solve this, we explore ways that let us turn collected sensor data into reconfigurable virtual environments.
In these environments, we can efficiently explore “what if” scenarios to quickly test self-driving systems in a wide range of situations, including scenarios that are rare or dangerous to encounter in the real world.
Together, these innovations bring us closer to a world where autonomous vehicles are a safe and integral part of everyday life.

Deep multi-object tracking for self-driving vehicles

Wallenberg AI, Autonomous Systems and Software Program, 2021-08-01 -- 2025-08-01.

Areas of Advance

Transport

Subject Categories (SSIF 2025)

Computer graphics and computer vision

Artificial Intelligence

Infrastructure

Chalmers e-Commons (incl. C3SE, 2020-)

ISBN

978-91-8103-209-3

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5667

Publisher

Chalmers

ED, Hörsalsvägen 11

Online

Opponent: Manmohan Chandraker, UC San Diego, USA

More information

Latest update

5/5/2025 1