Context Is Complex: From Dialogue Histories to Knowledge Integration in NLP
Licentiatavhandling, 2024

Today, Language Models (LMs) have shown impressive results in many Natural Language Processing (NLP) tasks. Recent advancements in scaling up the language models (large language models) suggest they can be relatively reliable tools to assist humans. However, do these models truly "understand" context in the sense of knowing it? To answer this question, we need to understand the definition of context. Although there is no universal definition, making it a complex concept, in NLP, it can take many forms, including exchanged conversations, external knowledge, linguistic structure, and more.

In this thesis, we address the complexity of context from an LM point of view in two central research questions: (1) How can LMs better incorporate dialogue histories and personas in conversational AI tasks? (2) How do LMs balance internal and external knowledge, and when do they prioritize one over the other? We present two studies to address these questions from different perspectives. First, we introduce a new training strategy to encourage the model to consider context in its responses. Then, we apply dissection methods, such as causal mediation analysis, to explore the internal mechanisms of LMs and understand how they interact with context.

Our findings from the first study demonstrate that introducing a relevant training strategy can slightly improve the model's overall performance. However, it does not indicate that the model consistently considers context in its responses. In contrast, the second study provides a clearer understanding of how the model interacts with context. It shows that the model first evaluates the context to ensure its relevance and, if deemed appropriate, incorporates it into its responses. In such cases, the model tends to rely heavily on the context, often ignoring its internal knowledge.

model analysis

context

retrieval-augmented models

causal mediation analysis

HC1, Hörsalar HC, Hörsalsvägen 14, Chalmers University of Technology, Campus Johanneberg
Opponent: Prof. Anna Rogers, IT University of Copenhagen

Författare

Mehrdad Farahani

Chalmers, Data- och informationsteknik, Data Science och AI

An Empirical Study of Multitask Learning to Improve Open Domain Dialogue Systems

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa),;(2023)p. 347-357

Paper i proceeding

Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,;(2024)p. 16966-16977

Paper i proceeding

Ämneskategorier

Datorteknik

Språkteknologi (språkvetenskaplig databehandling)

Datavetenskap (datalogi)

Infrastruktur

C3SE (Chalmers Centre for Computational Science and Engineering)

Utgivare

Chalmers

HC1, Hörsalar HC, Hörsalsvägen 14, Chalmers University of Technology, Campus Johanneberg

Online

Opponent: Prof. Anna Rogers, IT University of Copenhagen

Mer information

Senast uppdaterat

2024-12-12