A First Attempt at Unreliable News Detection in Swedish
Paper in proceeding, 2022

Throughout the COVID-19 pandemic, a parallel infodemic has also been going on such that the information has been spreading faster than the virus itself. During this time, every individual needs to access accurate news in order to take corresponding protective measures, regardless of their country of origin or the language they speak, as misinformation can cause significant loss to not only individuals but also society. In this paper we train several machine learning models (ranging from traditional machine learning to deep learning) to try to determine whether news articles come from either a reliable or an unreliable source, using just the body of the article. Moreover, we use a previously introduced corpus of news in Swedish related to the COVID-19 pandemic for the classification task. Given that our dataset is both unbalanced and small, we use subsampling and easy data augmentation (EDA) to try to solve these issues. In the end, we realize that, due to the small size of our dataset, using traditional machine learning along with data augmentation yields results that rival those of transformer models such as BERT.

Statistical

Text categorisation

Machine Learning Methods

Less-resourced languages

Author

Ricardo Muñoz Sánchez

University of Gothenburg

Eric Johansson

Student at Chalmers

Shakila Tayefeh

Student at Chalmers

Shreyash Kad

Student at Chalmers

International Workshop on Resources and Techniques for User Information in Abusive Language Analysis, ResT-UP 2022 - in conjunction with the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings

1-7
9791095546993 (ISBN)

2nd International Workshop on Resources and Techniques for User Information in Abusive Language Analysis, ResT-UP 2022
Marseille, France,

Subject Categories

Language Technology (Computational Linguistics)

More information

Latest update

10/27/2023