Pensieve Perception: Uncertainty, Language, and Novel Views for Autonomous Driving
Doctoral thesis, 2025
Novel view synthesis
Self-supervised learning
Object detection
Neural Radiance Fields
Autonomous driving
Gaussian Splatting
Author
Georg Hess
Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering
Object Detection as Probabilistic Set Prediction
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 13670 LNCS(2022)p. 550-566
Paper in proceeding
LidarCLIP or: How I Learned to Talk to Point Clouds
Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV),;(2024)p. 7423-7432
Paper in proceeding
NeuRAD: Neural Rendering for Autonomous Driving
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,;(2024)p. 14895-14904
Paper in proceeding
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),;(2025)
Paper in proceeding
In several cities, it is now possible to order a completely self-driving car with your phone, and it will take you from A to B.
While this is an impressive achievement, these robotaxis still only operate in a limited domain, such as a handful of cities, and they use hardware that is not affordable enough to be put inside a consumer vehicle.
Thus, the dream of everyone having access to affordable autonomous cars remains an open challenge.
In this thesis, we explore three key aspects to accelerate the scalable development of autonomous driving systems.
First, we propose new ways to model the uncertainty of object detectors.
Object detection is an important aspect of many autonomous systems and is used, for instance, to understand what other surrounding traffic participants are present and where they are.
Modeling the uncertainty is useful for understanding the limitations of detectors and for making informed decisions.
Second, training these autonomous systems usually requires vast amounts of labeled data, where humans mark objects and scenarios, a process both slow and expensive.
Multiple methods have explored how we can move away from explicit labels and instead use web-scale image-caption pairs to learn meaningful representations from natural language.
We extend this approach to lidar sensors, an important sensor in autonomous driving that can directly measure the distance to surrounding surfaces, without needing any manual annotations.
Our method enables us to use language to directly interact with 3D data and has the potential to reduce the reliance on human-provided annotations and speed up the development process.
Last, testing self-driving cars in the real world to cover every possible scenario is impractical and risky.
To solve this, we explore ways that let us turn collected sensor data into reconfigurable virtual environments.
In these environments, we can efficiently explore “what if” scenarios to quickly test self-driving systems in a wide range of situations, including scenarios that are rare or dangerous to encounter in the real world.
Together, these innovations bring us closer to a world where autonomous vehicles are a safe and integral part of everyday life.
Deep multi-object tracking for self-driving vehicles
Wallenberg AI, Autonomous Systems and Software Program, 2021-08-01 -- 2025-08-01.
Areas of Advance
Transport
Subject Categories (SSIF 2025)
Computer graphics and computer vision
Artificial Intelligence
Infrastructure
Chalmers e-Commons (incl. C3SE, 2020-)
ISBN
978-91-8103-209-3
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5667
Publisher
Chalmers