Improving Language Models Using Augmentation and Multi-Modality

Tobias Norlund

Improving Language Models Using Augmentation and Multi-Modality
Licentiate thesis, 2023

Language models have become a core component in modern Natural Language Processing (NLP) as they constitute a powerful base that is easily adaptable to many language processing tasks. Part of the strength lies in their ability to embed associations representing general world knowledge. However, the associations formed by these models are brittle, even when scaled to huge sizes and using massive amounts of data. This, in combination with other problems such as lack of attributability and high costs, motivate us to investigate other methods to improve on these aspects.

In this thesis, we investigate methods that augment language models with additional contextual information, for the purpose of simplifying the language modeling problem and increasing the formation of desirable associations. We also investigate whether multi-modal data can assist in forming such associations, that could otherwise be difficult to obtain from textual data only.

In our experiments, we showcase augmentation to be effective toward these ends, in both a textual and multi-modal case. We also demonstrate that visual data can assist in forming knowledge-representing associations in a language model.

natural language processing

contextual augmentation

multimodal language modeling

language models

Room Analysen, EDIT Building, Hörsalsvägen 11

Opponent: Pontus Stenetorp, Associate Professor, University College London, United Kingdom

Online defence

Author

Tobias Norlund

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Other publications Research

Building a Swedish Open-Domain Conversational Language Model

Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa),;(2021)p. 357-366

Paper in proceeding

Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?

BlackboxNLP 2021 - Proceedings of the 4th BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP,;(2021)p. 149-162

Paper in proceeding

Cross-modal Transfer Between Vision and Language for Protest Detection

CASE 2022 - 5th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text, Proceedings of the Workshop,;(2022)p. 56-60

Paper in proceeding

On the Generalization Ability of Retrieval-Enhanced Transformers

EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023,;(2023)p. 1455-1463

Paper in proceeding

Subject Categories

Other Computer and Information Science

Language Technology (Computational Linguistics)

Computer Science

Publisher

Chalmers

Room Analysen, EDIT Building, Hörsalsvägen 11

Online

Opponent: Pontus Stenetorp, Associate Professor, University College London, United Kingdom

More information

Latest update

4/24/2023

Improving Language Models Using Augmentation and Multi-Modality Licentiate thesis, 2023

Author

Tobias Norlund

Building a Swedish Open-Domain Conversational Language Model

Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?

Cross-modal Transfer Between Vision and Language for Protest Detection

On the Generalization Ability of Retrieval-Enhanced Transformers

Subject Categories

Publisher

More information

Latest update

Improving Language Models Using Augmentation and Multi-Modality
Licentiate thesis, 2023