A Deep Learning System for Automatic Extraction of Typological Linguistic Information from Descriptive Grammars
Paper i proceeding, 2021

Linguistic typology is an area of linguistics concerned with analysis of and comparison between natural languages of the world based on their certain linguistic features. For that purpose, historically, the area has relied on manual extraction of linguistic feature values from textural descriptions of languages. This makes it a laborious and time expensive task and is also bound by human brain capacity. In this study, we present a deep learning system for the task of automatic extraction of linguistic features from textual descriptions of natural languages. First, textual descriptions are manually annotated with special structures called semantic frames. Those annotations are learned by a recurrent neural network, which is then used to annotate un-annotated text. Finally, the annotations are converted to linguistic feature values using a separate rule based module. Word embeddings, learned from general purpose text, are used as a major source of knowledge by the recurrent neural network. We compare the proposed deep learning system to a previously reported machine learning based system for the same task, and the deep learning system wins in terms of F1 scores with a fair margin. Such a system is expected to be a useful contribution for the automatic curation of typological databases, which otherwise are manually developed.

Författare

Shafqat Mumtaz Virk

Göteborgs universitet

Daniel Foster

Göteborgs universitet

Muhammad Azam Sheikh

Chalmers, Data- och informationsteknik, CSE Verksamhetsstöd, Data Science Research Engineers

Raheela Saleem

GIFT University

International Conference Recent Advances in Natural Language Processing, RANLP

13138502 (ISSN)

1480-1489
9789544520724 (ISBN)

International Conference on Recent Advances in Natural Language Processing: Deep Learning for Natural Language Processing Methods and Applications, RANLP 2021
Virtual, Online, ,

Ämneskategorier

Språkteknologi (språkvetenskaplig databehandling)

Jämförande språkvetenskap och allmän lingvistik

Studier av enskilda språk

DOI

10.26615/978-954-452-072-4_166

Mer information

Senast uppdaterat

2022-02-09