SMAB: Simple Multimodal Attention for Effective BEV Fusion
Paper i proceeding, 2025

Sensor fusion plays a crucial role in accurate and robust environment perception for autonomous driving. Recent works utilize Bird's-Eye-View (BEV) grid as a 3D representation, however, only using a partial set of multimodal signals. This paper introduces Simple-Multimodal-Attention-BEV (SMAB), a novel and simple approach to multimodal sensor fusion in BEV perception. We propose an attention mechanism called BEV feature aggregation that effectively enhances BEV feature representations. It integrates bilinearly interpolated semantic data from cameras with rasterized distance information from radars and/or lidars, and facilitates training with full-modality data or partial-modality data without modification of the method. In addition to the simplicity of the design, we demonstrate that using all sensor modalities improves segmentation accuracy. Meanwhile, SMAB is resilient to sporadic sensor signal loss, which enhances the robustness of the perception system. The proposed method outperforms state-of-the-art methods while simplifying the model.

lightweight sensor fusion architecture

radar

multimodal fusion

camera

sparse signal fusion

multimodal BEV fusion

sensor fusion

multimodal attention BEV

Multimodal learning

lidar

deep learning

BEV feature aggregation

BEV

Författare

Amer Mustajbasic

Göteborgs universitet

Chalmers, Data- och informationsteknik, Data Science och AI

Shuangshuang Chen

Volvo Cars

Erik Stenborg

Zenseact AB

Selpi Selpi

Chalmers, Data- och informationsteknik, Data Science och AI

Göteborgs universitet

IEEE Intelligent Vehicles Symposium, Proceedings

1766-1772
9798331538033 (ISBN)

36th IEEE Intelligent Vehicles Symposium, IV 2025
Cluj - Napoca, Romania,

Djupt multimodalt lärande för fordonstillämpningar

VINNOVA (2023-00763), 2023-09-01 -- 2027-09-01.

Styrkeområden

Informations- och kommunikationsteknik

Transport

Ämneskategorier (SSIF 2025)

Datorgrafik och datorseende

Datavetenskap (datalogi)

Infrastruktur

C3SE (-2020, Chalmers Centre for Computational Science and Engineering)

DOI

10.1109/IV64158.2025.11097770

Mer information

Senast uppdaterat

2025-09-04