Lexical and Grammar Resource Engineering for Runyankore & Rukiga: A Symbolic Approach
Licentiatavhandling, 2021

Current research in computational linguistics and natural language processing (NLP) requires the existence of language resources. Whereas these resources are available for a few well-resourced languages, there are many languages that have been neglected. Among the neglected and / or under-resourced languages are Runyankore and Rukiga (henceforth referred to as Ry/Rk). Recently, the NLP community has started to acknowledge that resources for under-resourced languages should also be given priority. Why? One reason being that as far as language typology is concerned, the few well-resourced languages do not represent the structural diversity of the remaining languages.

The central focus of this thesis is about enabling the computational analysis and generation of utterances in Ry/Rk. Ry/Rk are two closely related languages spoken by about 3.4 and 2.4 million people respectively. They belong to the Nyoro-Ganda (JE10) language zone of the Great Lakes, Narrow Bantu of the Niger-Congo language family.

The computational processing of these languages is achieved by formalising the grammars of these two languages using Grammatical Framework (GF) and its Resource Grammar Library (RGL). In addition to the grammar, a general-purpose computational lexicon for the two languages is developed. Although we utilise the lexicon to tremendously increase the lexical coverage of the grammars, the lexicon can be used for other NLP tasks.

In this thesis a symbolic / rule-based approach is taken because the lack of adequate languages resources makes the use of data-driven NLP approaches unsuitable for these languages.

Computational Grammar

Runyankore

Grammar Resource

Grammatical Framework

Lexical Resource

Computational lexicon

Rukiga

Bantu Languages

Runyakitara

Resource Grammar Library

Language Resources

Grammar Engineering

CSE EDIT 8103 and online via Zoom
Opponent: Dr. Wanjiku Ng'ang'a, School of Computing and Informatics, University of Nairobi, Kenya

Författare

David Bamutura

Chalmers, Data- och informationsteknik, Funktionell programmering

Towards computational resource grammars for runyankore and rukiga

LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings,; (2020)p. 2846-2854

Paper i proceeding

Bamutura, Sabiiti David 2021 "Ry/Rk-Lex: A computational lexicon for Runyankore and Rukiga languages" Accepted to the Northern European Association for Language Technology post-proceeding series of the Swedish Language Technology Conference (SLTC 2020)

Styrkeområden

Informations- och kommunikationsteknik

Drivkrafter

Hållbar utveckling

Innovation och entreprenörskap

Ämneskategorier

Språkteknologi (språkvetenskaplig databehandling)

Jämförande språkvetenskap och allmän lingvistik

Studier av enskilda språk

Utgivare

Chalmers

CSE EDIT 8103 and online via Zoom

Online

Opponent: Dr. Wanjiku Ng'ang'a, School of Computing and Informatics, University of Nairobi, Kenya

Mer information

Senast uppdaterat

2021-06-04