Context Is Complex: From Dialogue Histories to Knowledge Integration in NLP
Licentiatavhandling, 2024
In this thesis, we address the complexity of context from an LM point of view in two central research questions: (1) How can LMs better incorporate dialogue histories and personas in conversational AI tasks? (2) How do LMs balance internal and external knowledge, and when do they prioritize one over the other? We present two studies to address these questions from different perspectives. First, we introduce a new training strategy to encourage the model to consider context in its responses. Then, we apply dissection methods, such as causal mediation analysis, to explore the internal mechanisms of LMs and understand how they interact with context.
Our findings from the first study demonstrate that introducing a relevant training strategy can slightly improve the model's overall performance. However, it does not indicate that the model consistently considers context in its responses. In contrast, the second study provides a clearer understanding of how the model interacts with context. It shows that the model first evaluates the context to ensure its relevance and, if deemed appropriate, incorporates it into its responses. In such cases, the model tends to rely heavily on the context, often ignoring its internal knowledge.
model analysis
context
retrieval-augmented models
causal mediation analysis
Författare
Mehrdad Farahani
Chalmers, Data- och informationsteknik, Data Science och AI
An Empirical Study of Multitask Learning to Improve Open Domain Dialogue Systems
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa),;(2023)p. 347-357
Paper i proceeding
Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,;(2024)p. 16966-16977
Paper i proceeding
Ämneskategorier
Datorteknik
Språkteknologi (språkvetenskaplig databehandling)
Datavetenskap (datalogi)
Infrastruktur
C3SE (Chalmers Centre for Computational Science and Engineering)
Utgivare
Chalmers
HC1, Hörsalar HC, Hörsalsvägen 14, Chalmers University of Technology, Campus Johanneberg
Opponent: Prof. Anna Rogers, IT University of Copenhagen