A Novel Vision Transformer for Camera-LiDAR Fusion Based Traffic Object Segmentation
Paper i proceeding, 2025

This paper presents Camera-LiDAR Fusion Transformer (CLFT) models for traffic object segmentation, which leverage the fusion of camera and LiDAR data using vision transformers. Building on the methodology of visual transformers that exploit the self-attention mechanism, we extend segmentation capabilities with additional classification options to a diverse class of objects including cyclists, traffic signs, and pedestrians across diverse weather conditions. Despite good performance, the models face challenges under adverse conditions which underscores the need for further optimization to enhance performance in darkness and rain. In summary, the CLFT models offer a compelling solution for autonomous driving perception, advancing the state-of-theart in multimodal fusion and object segmentation, with ongoing efforts required to address existing limitations and fully harness their potential in practical deployments.

Semantic Segmentation

Dense Vision Transformers

Residual Neural Network

Sensor Fusion

Författare

Toomas Tahves

Tallinns tekniska universitet (TalTech)

Junyi Claude Gu

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Tallinns tekniska universitet (TalTech)

Göteborgs universitet

Mauro Bellone

Tallinns tekniska universitet (TalTech)

Raivo Sell

Tallinns tekniska universitet (TalTech)

International Conference on Agents and Artificial Intelligence

21843589 (ISSN) 2184433X (eISSN)

Vol. 2 566-573

17th International Conference on Agents and Artificial Intelligence, ICAART 2025
Porto, Portugal,

Ämneskategorier (SSIF 2025)

Datorgrafik och datorseende

DOI

10.5220/0013239000003890

Mer information

Senast uppdaterat

2025-04-15