Part-of-Speech Taggers Make Errors on Unambiguous Sentences
Paper in proceeding, 2025

We show that commonly used part-of-speech (POS) taggers, despite their high reported performance, in many cases make tagging errors on simple and unambiguous sentences. We collect a new data set of non-ambiguous sentences that can easily be tagged by human taggers, but where at least one standard POS tagger makes precisely one tag-ging error. Furthermore, we present a method for generating rules that are meant to correct the output of a standard POS tagger. Applying this method to the new data set, we extract a set of such rules, which are then evaluated over another data set introduced in earlier work. Our results show that the method works, but also that the increase in tagging accuracy is rather small, probably due to the small size of our training data set. Finally, we present an analysis of POS tagging in general, con-cluding that there are multiple ambiguities that introduce unresolvable challenges in POS tagging.

Part-of-speech tagging

Natural language processing

Rule-based systems

Author

Minerva Suvanto

Chalmers, Mechanics and Maritime Sciences (M2), Vehicle Engineering and Autonomous Systems

Mattias Wahde

Chalmers, Mechanics and Maritime Sciences (M2), Vehicle Engineering and Autonomous Systems

Marco L. Della Vedova

Chalmers, Mechanics and Maritime Sciences (M2), Vehicle Engineering and Autonomous Systems

Lecture Notes in Computer Science

0302-9743 (ISSN) 1611-3349 (eISSN)

Vol. 15591 LNAI 207-221
9783031873263 (ISBN)

16th International Conference on Agents and Artificial Intelligence, ICAART 2024
Rome, Italy,

Subject Categories (SSIF 2025)

Natural Language Processing

Computer Sciences

DOI

10.1007/978-3-031-87327-0_10

More information

Latest update

6/27/2025