Automatic Annotation of Bibliographical References with Target Language
Paper in proceedings, 2008

In a large-scale project to list bibliographical references to all of the ca 7 000 languages of the world, the need arises to automatically annotated the bibliographical entries with ISO-639-3 language identifiers. The task can be seen as a special case of a more general Information Extraction problem: to classify short text snippets in various languages into a large number of classes. We will explore supervised and unsupervised approaches motivated by distributional characterists of the specific domain and availability of data sets. In all cases, we make use of a database with language names and identifiers. The suggested methods are rigorously evaluated on a fresh representative data set.

Author

Harald Hammarström

University of Gothenburg

Coling 2008: Proceedings of MMIES-2: Workshop on Multi-source, Multilingual Information Extraction and Summarization; August 2008, Manchester

57-64

Subject Categories

Computer Science

More information

Created

10/10/2017