GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving
Paper i proceeding, 2026

Self-supervised pre-training based on next-token prediction has enabled large language models to capture the underlying structure of text, and has led to unprecedented performance on a large array of tasks when applied at scale. Similarly, autonomous driving generates vast amounts of spatiotemporal data, alluding to the possibility of harnessing scale to learn the underlying geometric and semantic structure of the environment and its evolution over time. In this direction, we propose a geometric and semantic self-supervised pre-training method, GASP, that learns a unified representation by predicting, at any queried future point in spacetime, (1) general occupancy, capturing the evolving structure of the 3D scene; (2) ego occupancy, modeling the ego vehicle path through the environment; and (3) distilled high-level features from a vision foundation model. By modeling geometric and semantic 4D occupancy fields instead of raw sensor measurements, the model learns a structured, generalizable representation of the environment and its evolution through time. We validate GASP on multiple autonomous driving benchmarks, demonstrating significant improvements in semantic occupancy forecasting, online mapping, and ego trajectory prediction. Our results demonstrate that continuous 4D geometric and semantic occupancy prediction provides a scalable and effective pre-training paradigm for autonomous driving. For code and additional visualizations, see our project page.

Författare

William Ljungbergh

Zenseact AB

Linköpings universitet

Adam Lilja

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Zenseact AB

Adam Tonderski

Zenseact AB

Lunds universitet

Arvid Laveno Ling

Zenseact AB

Carl Lindström

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Zenseact AB

Willem Verbeke

Zenseact AB

Junsheng Fu

Zenseact AB

Christoffer Petersson

Zenseact AB

Chalmers, Matematiska vetenskaper, Algebra och geometri

Lars Hammarstrand

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Michael Felsberg

Linköpings universitet

Proceedings 2026 IEEE Cvf Winter Conference on Applications of Computer Vision Wacv 2026

3077-3087
9798331555115 (ISBN)

2026 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2026
Tucson, USA,

Ämneskategorier (SSIF 2025)

Datorgrafik och datorseende

Datavetenskap (datalogi)

DOI

10.1109/WACV61042.2026.00301

Mer information

Senast uppdaterat

2026-06-23