Fine-Tuning Language Models on Dutch Protest Event Tweets
Paper i proceeding, 2024

Being able to obtain timely information about an event, like a protest, becomes increasingly more relevant with the rise of affective polarisation and social unrest over the world. Nowadays, large-scale protests tend to be organised and broadcast through social media. Analysing social media platforms like X has proven to be an effective method to follow events during a protest. Thus, we trained several language models on Dutch tweets to analyse their ability to classify if a tweet expresses discontent, considering these tweets may contain practical information about a protest. Our results show that models pre-trained on Twitter data, including Bernice and TwHIN-BERT, outperform models that are not. Additionally, the results showed that Sentence Transformers is a promising model. The added value of oversampling is greater for models that were not trained on Twitter data. In line with previous work, preprocessing the data did not help a transformer language model to make better predictions.

Författare

Meagan Loerakker

Chalmers, Data- och informationsteknik, Interaktionsdesign och Software Engineering

Netherlands Police

Laurens H.F. Müter

Netherlands Police

Universiteit Utrecht

Marijn P. Schraagen

Universiteit Utrecht

CASE 2024 - 7th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text, Proceedings of the Workshop

6-23
9798891760707 (ISBN)

7th Workshop on Challenges and Applications of Automated Extraction of Socio-Political Events from Text, CASE 2024
St. Julian's, Malta,

PAPACUI: Färdighetsbaserad anpassning i interaktiva system för fysisk aktivitet

Vetenskapsrådet (VR) (2022-03196), 2023-01-01 -- 2026-12-31.

Ämneskategorier

Språkteknologi (språkvetenskaplig databehandling)

Jämförande språkvetenskap och allmän lingvistik

Studier av enskilda språk

Mer information

Senast uppdaterat

2024-08-07