Representations, Retrieval, and Evaluation in Knowledge-Intensive Natural Language Processing
Doktorsavhandling, 2025

Several major advancements have recently been made within the field of Natural Language Processing (NLP). Nowadays, NLP systems based on language models (LMs) are readily available to the public in the form of chatbots, code assistants, writing assistants, etc. Any task that can be described in text can be, and is, addressed by NLP systems, covering the expected tasks as well as less expected tasks. While these advancements have highlighted many strengths of NLP systems, they have also highlighted weaknesses of NLP systems, hindering their use in certain scenarios. For example, modern NLP systems are neither reliable nor interpretable, limiting their usefulness for e.g. knowledge-intensive or high-risk tasks. In this thesis, we focus on the application of NLP systems to knowledge-intensive situations. We consider how methods leveraging different types of representations of information, such as the parametric memory of a model trained on multimodal information or retrieval-augmented generation (RAG), can be used to improve the systems. We find that RAG can be used to improve the stability of NLP systems for knowledge-intensive tasks, and bigger LMs generally are more efficient in leveraging the external information in RAG. We also develop datasets and methods to allow for more comprehensive and precise evaluations of NLP systems in knowledge-intensive situations. We find that insights gained from synthesised evaluation datasets are not guaranteed to transfer to real-world scenarios and that evaluation results are sensitive to how the knowledge under consideration interacts with the parametric memory of the LM. Taken together, the work included in this thesis improves our understanding of NLP systems for knowledge-intensive situations and highlights the important role of representations of information as well as realistic benchmarks for NLP.

Context Utilisation

Knowledge-intensive Tasks

Vision-and-Language Models

Retrieval-Augmented Generation

Mechanistic Interpretability

Language Models

Evaluation

Natural Language Processing

MC-salen, Hörsalsvägen 5.
Opponent: Prof. Ivan Vulić, Language Technology Lab, University of Cambridge, England.

Författare

Lovisa Hagström

Data Science och AI 2

Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?

BlackboxNLP 2021 - Proceedings of the 4th BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP,;(2021)p. 149-162

Paper i proceeding

The Effect of Scaling, Retrieval Augmentation and Form on the Factual Consistency of Language Models

EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings,;(2023)p. 5457-5476

Paper i proceeding

A Reality Check on Context Utilisation for Retrieval-Augmented Generation

Proceedings of the Annual Meeting of the Association for Computational Linguistics,;Vol. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)(2025)p. 19691-19730

Paper i proceeding

Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion

Findings of the Association for Computational Linguistics: ACL 2025,;(2025)p. 18322-18349

Paper i proceeding

L. Hagström, Y. Kim, H. Yu, S. Lee, R. Johansson, H. Cho, I. Augenstein. CUB: Benchmarking Context Utilisation Techniques for Language Models.

Several major advancements have recently been made within the field of Natural Language Processing (NLP). Nowadays, NLP systems based on language models (LMs) are readily available to the public in the form of chatbots, code assistants, writing assistants, etc. These systems can tackle almost any task that can be described in text — ranging from the expected (like translation or summarisation) to the surprising (like creative writing or software debugging). While these advancements have revealed impressive capabilities of NLP systems, they have also exposed significant weaknesses. Modern NLP models are often unreliable and lack interpretability — they can generate incorrect outputs that appear plausible, and we have little insight into how or why those outputs were produced. This poses a serious limitation in knowledge-intensive or high-risk domains, where answers are difficult to verify and mistakes can be costly. In this thesis, we investigate system designs and evaluation strategies that can help lay the groundwork for more reliable NLP systems.

Ämneskategorier (SSIF 2025)

Språkbehandling och datorlingvistik

Datavetenskap (datalogi)

ISBN

978-91-8103-250-5

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5708

Utgivare

Chalmers

MC-salen, Hörsalsvägen 5.

Online

Opponent: Prof. Ivan Vulić, Language Technology Lab, University of Cambridge, England.

Mer information

Senast uppdaterat

2025-08-25