Context Is Complex: From Dialogue Histories to Knowledge Integration in NLP
Licentiate thesis, 2024

Today, Language Models (LMs) have shown impressive results in many Natural Language Processing (NLP) tasks. Recent advancements in scaling up the language models (large language models) suggest they can be relatively reliable tools to assist humans. However, do these models truly "understand" context in the sense of knowing it? To answer this question, we need to understand the definition of context. Although there is no universal definition, making it a complex concept, in NLP, it can take many forms, including exchanged conversations, external knowledge, linguistic structure, and more.

In this thesis, we address the complexity of context from an LM point of view in two central research questions: (1) How can LMs better incorporate dialogue histories and personas in conversational AI tasks? (2) How do LMs balance internal and external knowledge, and when do they prioritize one over the other? We present two studies to address these questions from different perspectives. First, we introduce a new training strategy to encourage the model to consider context in its responses. Then, we apply dissection methods, such as causal mediation analysis, to explore the internal mechanisms of LMs and understand how they interact with context.

Our findings from the first study demonstrate that introducing a relevant training strategy can slightly improve the model's overall performance. However, it does not indicate that the model consistently considers context in its responses. In contrast, the second study provides a clearer understanding of how the model interacts with context. It shows that the model first evaluates the context to ensure its relevance and, if deemed appropriate, incorporates it into its responses. In such cases, the model tends to rely heavily on the context, often ignoring its internal knowledge.

model analysis

context

retrieval-augmented models

causal mediation analysis

HC1, Hörsalar HC, Hörsalsvägen 14, Chalmers University of Technology, Campus Johanneberg
Opponent: Prof. Anna Rogers, IT University of Copenhagen

Author

Mehrdad Farahani

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

An Empirical Study of Multitask Learning to Improve Open Domain Dialogue Systems

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa),;(2023)p. 347-357

Paper in proceeding

Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,;(2024)p. 16966-16977

Paper in proceeding

Subject Categories

Computer Engineering

Language Technology (Computational Linguistics)

Computer Science

Infrastructure

C3SE (Chalmers Centre for Computational Science and Engineering)

Publisher

Chalmers

HC1, Hörsalar HC, Hörsalsvägen 14, Chalmers University of Technology, Campus Johanneberg

Online

Opponent: Prof. Anna Rogers, IT University of Copenhagen

More information

Latest update

12/12/2024