A Novel Vision Transformer for Camera-LiDAR Fusion Based Traffic Object Segmentation
Paper in proceeding, 2025

This paper presents Camera-LiDAR Fusion Transformer (CLFT) models for traffic object segmentation, which leverage the fusion of camera and LiDAR data using vision transformers. Building on the methodology of visual transformers that exploit the self-attention mechanism, we extend segmentation capabilities with additional classification options to a diverse class of objects including cyclists, traffic signs, and pedestrians across diverse weather conditions. Despite good performance, the models face challenges under adverse conditions which underscores the need for further optimization to enhance performance in darkness and rain. In summary, the CLFT models offer a compelling solution for autonomous driving perception, advancing the state-of-theart in multimodal fusion and object segmentation, with ongoing efforts required to address existing limitations and fully harness their potential in practical deployments.

Semantic Segmentation

Dense Vision Transformers

Residual Neural Network

Sensor Fusion

Author

Toomas Tahves

Tallinn University of Technology (TalTech)

Junyi Claude Gu

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

Tallinn University of Technology (TalTech)

University of Gothenburg

Mauro Bellone

Tallinn University of Technology (TalTech)

Raivo Sell

Tallinn University of Technology (TalTech)

International Conference on Agents and Artificial Intelligence

21843589 (ISSN) 2184433X (eISSN)

Vol. 2 566-573

17th International Conference on Agents and Artificial Intelligence, ICAART 2025
Porto, Portugal,

Subject Categories (SSIF 2025)

Computer graphics and computer vision

DOI

10.5220/0013239000003890

More information

Latest update

4/15/2025