A naive theory of affixation and an algorithm for extraction
Paper in proceeding, 2006

We present a novel approach to the unsupervised detection of affixes, that is, to extract a set of salient prefixes and suffixes from an unlabeled corpus of a language. The underlying theory makes no assumptions on whether the language uses a lot of morphology or not, whether it is prefixing or suffixing, or whether affixes are long or short. It does however make the assumption that 1. salient affixes have to be frequent, i.e occur much more often that random segments of the same length, and that 2. words essentially are variable length sequences of random characters, e.g a character should not occur in far too many words than random without a reason, such as being part of a very frequent affix. The affix extraction algorithm uses only information from fluctation of frequencies, runs in linear time, and is free from thresholds and untransparent iterations. We demonstrate the usefulness of the approach with example case studies on typologically distant languages.

Author

Harald Hammarström

Chalmers, Computer Science and Engineering (Chalmers)

HLT-NAACL 2006 - SIGPHON 2006: 8th Meeting of the ACL Special Interest Group on Computational Phonology, Proceedings of the Workshop

79-88

8th Meeting of the ACL Special Interest Group on Computational Phonology, SIGPHON 2006, collocated with the HLT-NAACL 2006
New York City, USA,

Subject Categories

Language Technology (Computational Linguistics)

General Language Studies and Linguistics

Specific Languages

More information

Latest update

12/9/2021