Masked Autoencoder for Self-Supervised Pre-Training on Lidar Point Clouds
Paper in proceeding, 2023

Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose Voxel-MAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40 of the annotated data to outperform a randomly initialized equivalent. Code is available at https://github.com/georghess/voxel-mae.

3d object detection

Self-supervised

Object detection

Voxel-MAE

Deep learning

Masked autoencoding

Author

Georg Hess

Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering

Zenseact AB

Johan Jaxing

Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering

Elias Svensson

Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering

David Hagerman Olzon

Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering

Christoffer Petersson

Zenseact AB

Chalmers, Mathematical Sciences, Algebra and geometry

Lennart Svensson

Chalmers, Electrical Engineering, Signal Processing and Biomedical Engineering

Proceedings - 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, WACVW 2023

350-359
979-8-3503-2056-5 (ISBN)

IEEE Workshop on Applications of Computer Vision (WACV)
Waikoloa, USA,

Deep multi-object tracking for self-driving vehicles

Wallenberg AI, Autonomous Systems and Software Program, 2021-08-01 -- 2025-08-01.

Areas of Advance

Transport

Infrastructure

C3SE (Chalmers Centre for Computational Science and Engineering)

Subject Categories

Computer Vision and Robotics (Autonomous Systems)

DOI

10.1109/WACVW58289.2023.00039

More information

Latest update

7/19/2023