SMAB: Simple Multimodal Attention for Effective BEV Fusion
Paper in proceeding, 2025

Sensor fusion plays a crucial role in accurate and robust environment perception for autonomous driving. Recent works utilize Bird's-Eye-View (BEV) grid as a 3D representation, however, only using a partial set of multimodal signals. This paper introduces Simple-Multimodal-Attention-BEV (SMAB), a novel and simple approach to multimodal sensor fusion in BEV perception. We propose an attention mechanism called BEV feature aggregation that effectively enhances BEV feature representations. It integrates bilinearly interpolated semantic data from cameras with rasterized distance information from radars and/or lidars, and facilitates training with full-modality data or partial-modality data without modification of the method. In addition to the simplicity of the design, we demonstrate that using all sensor modalities improves segmentation accuracy. Meanwhile, SMAB is resilient to sporadic sensor signal loss, which enhances the robustness of the perception system. The proposed method outperforms state-of-the-art methods while simplifying the model.

lightweight sensor fusion architecture

radar

multimodal fusion

camera

sparse signal fusion

multimodal BEV fusion

sensor fusion

multimodal attention BEV

Multimodal learning

lidar

deep learning

BEV feature aggregation

BEV

Author

Amer Mustajbasic

University of Gothenburg

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Shuangshuang Chen

Volvo Cars

Erik Stenborg

Zenseact AB

Selpi Selpi

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

University of Gothenburg

IEEE Intelligent Vehicles Symposium, Proceedings

1766-1772
9798331538033 (ISBN)

36th IEEE Intelligent Vehicles Symposium, IV 2025
Cluj - Napoca, Romania,

Deep MultiModal Learning for Automotive Applications

VINNOVA (2023-00763), 2023-09-01 -- 2027-09-01.

Areas of Advance

Information and Communication Technology

Transport

Subject Categories (SSIF 2025)

Computer graphics and computer vision

Computer Sciences

Infrastructure

C3SE (-2020, Chalmers Centre for Computational Science and Engineering)

DOI

10.1109/IV64158.2025.11097770

More information

Latest update

9/4/2025 9