Part-of-Speech Taggers Make Errors on Unambiguous Sentences
Paper i proceeding, 2025

We show that commonly used part-of-speech (POS) taggers, despite their high reported performance, in many cases make tagging errors on simple and unambiguous sentences. We collect a new data set of non-ambiguous sentences that can easily be tagged by human taggers, but where at least one standard POS tagger makes precisely one tag-ging error. Furthermore, we present a method for generating rules that are meant to correct the output of a standard POS tagger. Applying this method to the new data set, we extract a set of such rules, which are then evaluated over another data set introduced in earlier work. Our results show that the method works, but also that the increase in tagging accuracy is rather small, probably due to the small size of our training data set. Finally, we present an analysis of POS tagging in general, con-cluding that there are multiple ambiguities that introduce unresolvable challenges in POS tagging.

Part-of-speech tagging

Natural language processing

Rule-based systems

Författare

Minerva Suvanto

Chalmers, Mekanik och maritima vetenskaper, Fordonsteknik och autonoma system

Mattias Wahde

Chalmers, Mekanik och maritima vetenskaper, Fordonsteknik och autonoma system

Marco L. Della Vedova

Chalmers, Mekanik och maritima vetenskaper, Fordonsteknik och autonoma system

Lecture Notes in Computer Science

0302-9743 (ISSN) 1611-3349 (eISSN)

Vol. 15591 LNAI 207-221
9783031873263 (ISBN)

16th International Conference on Agents and Artificial Intelligence, ICAART 2024
Rome, Italy,

Ämneskategorier (SSIF 2025)

Språkbehandling och datorlingvistik

Datavetenskap (datalogi)

DOI

10.1007/978-3-031-87327-0_10

Mer information

Senast uppdaterat

2025-06-27