Enhancing glass-box methods for part-of-speech tagging and text classification

Minerva Suvanto

Enhancing glass-box methods for part-of-speech tagging and text classification
Licentiate thesis, 2025

This thesis explores interpretable methods in two natural language processing (NLP) tasks, namely part-of-speech (POS) tagging and text classification. Currently, the NLP field is centered on the development and deployment of deep neural networks (DNNs), which have been established as the state-of-the-art and exhibit high performance in a variety of benchmarks. These models are to all intents and purposes black-boxes that lack interpretability, which is a major disadvantage when considering their use in high-stakes situations where transparency is essential.

The focus of this thesis is therefore on enhancing inherently transparent (glass-box) methods, thus establishing a foundation for their use in high-stakes scenarios. First, the task of POS tagging is considered. While often considered a solved problem, the findings here show that several challenges in POS tagging still remain. Based on these findings, a rule-based approach is explored for correcting the output of POS taggers. Second, interpretable text classification is studied with an enhanced linear classification method. The results demonstrate that a fully interpretable classifier can achieve a high performance when using the proposed enhancements, approaching that of pretrained DNN-based methods.

glass-box methods

part-of-speech tagging

interpretability

natural language processing

text classification

HC1, Hörsalsvägen 14, Chalmers

Opponent: Dr. Peter Barclay, Edinburgh Napier University, Scotland

Author

Minerva Suvanto

Chalmers, Mechanics and Maritime Sciences (M2), Vehicle Engineering and Autonomous Systems

Other publications Research

Suvanto M., Wahde M., and Della Vedova M. L., Part-of-speech Taggers Make Errors on Unambiguous Sentences

Suvanto M. and Wahde M., Improving glass-box sentiment classification via feature set extension

Subject Categories (SSIF 2025)

Natural Language Processing

Publisher

Chalmers