LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks
Paper i proceeding, 2025

Today’s Large Language Models (LLMs) have showcased exemplary capabilities, ranging from simple text generation to advanced image processing. Such models are currently being explored for in-vehicle services such as supporting perception tasks in Advanced Driver Assistance Systems (ADAS) or Autonomous Driving (AD) systems, given the LLMs’ capabilities to process multi-modal data. However, LLMs often generate nonsensical or unfaithful information, known as “hallucinations”: a notable issue that needs to be mitigated. In this paper, we systematically explore the adoption of SelfCheckGPT to spot hallucinations by three state-of-the-art LLMs (GPT-4o, LLaVA, and Llama3) when analysing visual automotive data from two sources: Waymo Open Dataset, from the US, and PREPER CITY dataset, from Sweden. Our results show that GPT-4o is better at generating faithful image captions than LLaVA, whereas the former demonstrated leniency in mislabeling non-hallucinated content as hallucinations compared to the latter. Furthermore, the analysis of the performance metrics revealed that the dataset type (Waymo or PREPER CITY) did not significantly affect the quality of the captions or the effectiveness of hallucination detection. However, the models showed better performance rates over images captured during daytime, compared to during dawn, dusk or night. Overall, the results show that SelfCheckGPT and its adaptation can be used to filter hallucinations in generated traffic-related image captions for state-of-the-art LLMs.

multi-modal data

hallucination detection

perception systems

safety-critical systems

large language models

automotive

Författare

Malsha Ashani Mahawatta Dona

Göteborgs universitet

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Beatriz Cabrero-Daniel

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Göteborgs universitet

Yinan Yu

Chalmers, Data- och informationsteknik, Funktionell programmering

Christian Berger

Göteborgs universitet

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

03029743 (ISSN) 16113349 (eISSN)

Vol. 15383 LNCS 114-130
9783031808883 (ISBN)

36th IFIP WG 6.1 International Conference on Testing Software and Systems, ICTSS 2024
London, United Kingdom,

SAICOM

Stiftelsen för Strategisk forskning (SSF) (FUS21-0004), 2022-06-01 -- 2027-05-31.

Ämneskategorier (SSIF 2025)

Språkbehandling och datorlingvistik

DOI

10.1007/978-3-031-80889-0_8

Mer information

Senast uppdaterat

2025-03-07