Improving Language Models Using Augmentation and Multi-Modality
Licentiate thesis, 2023

Language models have become a core component in modern Natural Language Processing (NLP) as they constitute a powerful base that is easily adaptable to many language processing tasks. Part of the strength lies in their ability to embed associations representing general world knowledge. However, the associations formed by these models are brittle, even when scaled to huge sizes and using massive amounts of data. This, in combination with other problems such as lack of attributability and high costs, motivate us to investigate other methods to improve on these aspects.

In this thesis, we investigate methods that augment language models with additional contextual information, for the purpose of simplifying the language modeling problem and increasing the formation of desirable associations. We also investigate whether multi-modal data can assist in forming such associations, that could otherwise be difficult to obtain from textual data only.

In our experiments, we showcase augmentation to be effective toward these ends, in both a textual and multi-modal case. We also demonstrate that visual data can assist in forming knowledge-representing associations in a language model.

natural language processing

contextual augmentation

multimodal language modeling

language models

Room Analysen, EDIT Building, Hörsalsvägen 11
Opponent: Pontus Stenetorp, Associate Professor, University College London, United Kingdom


Tobias Norlund

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Building a Swedish Open-Domain Conversational Language Model

Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa),; (2021)p. 357-366

Paper in proceeding

Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?

BlackboxNLP 2021 - Proceedings of the 4th BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP,; (2021)p. 149-162

Paper in proceeding

Cross-modal Transfer Between Vision and Language for Protest Detection

CASE 2022 - 5th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text, Proceedings of the Workshop,; (2022)p. 56-60

Paper in proceeding

On the Generalization Ability of Retrieval-Enhanced Transformers

EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023,; (2023)p. 1455-1463

Paper in proceeding

Subject Categories

Other Computer and Information Science

Language Technology (Computational Linguistics)

Computer Science



Room Analysen, EDIT Building, Hörsalsvägen 11


Opponent: Pontus Stenetorp, Associate Professor, University College London, United Kingdom

More information

Latest update