Bootstrapping Language Description: The case of Mpiemo (Bantu A, Central African Republic)
Paper i proceeding, 2008

Linguists have long been producing grammatical decriptions of yet undescribed languages. This is a time-consuming process, which has already adapted to improved technology for recording and storage. We present here a novel application of NLP techniques to bootstrap analysis of collected data and speed-up manual selection work. To be more precise, we argue that unsupervised induction of morphology and part-of-speech analysis from raw text data is mature enough to produce useful results. Experiments with Latent Semantic Analysis were less fruitful. We exemplify this on Mpiemo, a so-far essentially undescribed Bantu language of the Central African Republic, for which raw text data was available.

Acquisition

Endangered languages

Language modelling

Machine Learning

Författare

Harald Hammarström

Göteborgs universitet

Christina Thornell

Göteborgs universitet

Malin Petzell

Göteborgs universitet

Torbjörn Westerlund

Proceedings of the 6th edition of the Language Resources and Evaluation Conference (LREC 2008), 28-30 may 2008, Marrakech, Morocco,

Ämneskategorier

Studier av enskilda språk

Datavetenskap (datalogi)