LidarCLIP or: How I Learned to Talk to Point Clouds

Georg Hess; Adam Tonderski; Christoffer Petersson; Kalle Åström; Lennart Svensson

doi:10.1109/WACV57701.2024.00727

LidarCLIP or: How I Learned to Talk to Point Clouds
Paper in proceeding, 2024

Research connecting text and images has recently seen several breakthroughs, with models like CLIP, DALL•E 2, and Stable Diffusion. However, the connection between text and other visual modalities, such as lidar data, has received less attention, prohibited by the lack of text-lidar datasets. In this work, we propose LidarCLIP, a mapping from automotive point clouds to a pre-existing CLIP embedding space. Using image-lidar pairs, we supervise a point cloud encoder with the image CLIP embeddings, effectively relating text and lidar data with the image domain as an intermediary. We show the effectiveness of Lidar-CLIP by demonstrating that lidar-based retrieval is generally on par with image-based retrieval, but with complementary strengths and weaknesses. By combining image and lidar features, we improve upon both single-modality methods and enable a targeted search for challenging detection scenarios under adverse sensor conditions. We also explore zero-shot classification and show that LidarCLIP outperforms existing attempts to use CLIP for point clouds by a large margin. Finally, we leverage our compatibility with CLIP to explore a range of applications, such as point cloud captioning and lidar-to-image generation, without any additional training. Code and pre-trained models at github.com/atonderski/lidarclip.

multi-modal

autonoma fordon

point clouds

lidar

autonomous driving

Author

Georg Hess

Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering

Other publications Research

Adam Tonderski

Zenseact AB

Christoffer Petersson

Chalmers, Mathematical Sciences, Algebra and geometry

Other publications Research

Kalle Åström

Lund University

Lennart Svensson

Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering

Other publications Research

Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

2642-9381 (ISSN)

7423-7432
979-8-3503-1892-0 (ISBN)

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Waikoloa, USA,

Deep multi-object tracking for ground truth trajectory estimation

VINNOVA (2017-05521), 2018-07-01 -- 2022-06-30.

Show Project

Areas of Advance

Transport

Subject Categories

Computer Vision and Robotics (Autonomous Systems)

DOI

10.1109/WACV57701.2024.00727

Publication data connected to DOI

More information

Created

4/12/2024

LidarCLIP or: How I Learned to Talk to Point Clouds Paper in proceeding, 2024

Author

Georg Hess

Adam Tonderski

Christoffer Petersson

Kalle Åström

Lennart Svensson

Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Deep multi-object tracking for ground truth trajectory estimation

Areas of Advance

Subject Categories

DOI

More information

Created

LidarCLIP or: How I Learned to Talk to Point Clouds
Paper in proceeding, 2024