The Effect of Scaling, Retrieval Augmentation and Form on the Factual Consistency of Language Models
Paper in proceeding, 2023

Large Language Models (LLMs) make natural interfaces to factual knowledge, but their usefulness is limited by their tendency to deliver inconsistent answers to semantically equivalent questions. For example, a model might predict both “Anne Redpath passed away in Edinburgh.” and “Anne Redpath's life ended in London.” In this work, we identify potential causes of inconsistency and evaluate the effectiveness of two mitigation strategies: up-scaling and augmenting the LM with a retrieval corpus. Our results on the LLaMA and Atlas models show that both strategies reduce inconsistency while retrieval augmentation is considerably more efficient. We further consider and disentangle the consistency contributions of different components of Atlas. For all LMs evaluated we find that syntactical form and other evaluation task artifacts impact consistency. Taken together, our results provide a better understanding of the factors affecting the factual consistency of language models.

Author

Lovisa Hagström

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Denitsa Saynova

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Tobias Norlund

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Moa Johansson

AoA Information and Communications technology

Richard Johansson

University of Gothenburg

EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings

5457-5476
9798891760608 (ISBN)

2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
Hybrid, Singapore, Singapore,

Subject Categories (SSIF 2011)

Specific Languages

DOI

10.18653/v1/2023.emnlp-main.332

More information

Latest update

9/23/2024