BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems
Paper in proceeding, 2025

Large language models (LLMs) are growingly extended to process multimodal data such as text and video simultaneously. Their remarkable performance in understanding what is shown in images is surpassing specialized neural networks (NNs) such as Yolo that is supporting only a well-formed but very limited vocabulary, ie., objects that they are able to detect. When being non-restricted, LLMs and in particular state-of-the-art vision language models (VLMs) show impressive performance to describe even complex traffic situations. This is making them potentially suitable components for automotive perception systems to support the understanding of complex traffic situations or edge case situation. However, LLMs and VLMs are prone to hallucination, which mean to either potentially not seeing traffic agents such as vulnerable road users who are present in a situation, or to seeing traffic agents who are not there in reality. While the latter is unwanted making an ADAS or autonomous driving systems (ADS) to unnecessarily slow down, the former could lead to disastrous decisions from an ADS. In our work, we are systematically assessing the performance of 3 state-of-the-art VLMs on a diverse subset of traffic situations sampled from the Waymo Open Dataset to support safety guardrails for capturing such hallucinations in VLM-supported perception systems. We observe that both, proprietary and open VLMs exhibit remarkable image understanding capabilities even paying thorough attention to fine details sometimes difficult to spot for us humans. However, they are also still prone to making up elements in their descriptions to date requiring hallucination detection strategies such as BetterCheck that we propose in our work.

Author

Malsha Ashani Mahawatta Dona

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

University of Gothenburg

Beatriz Cabrero-Daniel

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

University of Gothenburg

Yinan Yu

Chalmers, Computer Science and Engineering (Chalmers), Functional Programming

University of Gothenburg

Christian Berger

University of Gothenburg

Chalmers, Computer Science and Engineering (Chalmers), Interaction Design and Software Engineering

IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC

2153-0009 (ISSN) 2153-0017 (eISSN)

3776-3783
9798331524180 (ISBN)

28th International Conference on Intelligent Transportation Systems, ITSC 2025
Gold Coast, Australia,

SAICOM

Swedish Foundation for Strategic Research (SSF) (FUS21-0004), 2022-06-01 -- 2027-05-31.

Subject Categories (SSIF 2025)

Computer graphics and computer vision

DOI

10.1109/ITSC60802.2025.11423129

More information

Latest update

5/4/2026 8