Learning Domain-Specific Grammars from a Small Number of Examples
Paper i proceeding, 2021

In this chapter we investigate the problem of grammar learning from a perspective that diverges from previous approaches. These prevailing approaches to learning grammars usually attempt to infer a grammar directly from example corpora without any additional information. This either requires a large training set or suffers from bad accuracy. We instead view learning grammars as a problem of grammar restriction or subgrammar extraction. We start from a large-scale grammar (called a resource grammar) and a small number of example sentences, and find a subgrammar that still covers all the examples. To accomplish this, we formulate the problem as a constraint satisfaction problem, and use a constraint solver to find the optimal grammar. We created experiments with English, Finnish, German, Swedish, and Spanish, which show that 10–20 examples are often sufficient to learn an interesting grammar for a specific application. We also present two extensions to this basic method: we include negative examples and allow rules to be merged. The resulting grammars can more precisely cover specific linguistic phenomena. Our method, together with the extensions, can be used to provide a grammar learning system for specific applications. This system is easy-to-use, human-centric, and can be used by non-syntacticians. Based on this grammar learning method, we can build applications for computer-assisted language learning and interlingual communication, which rely heavily on the knowledge of language and domain experts who often lack the competence to develop required grammars themselves.

Grammar learning

Grammar restriction

Constraint satisfaction

Domain-specific grammar

Författare

Herbert Lange

Chalmers, Data- och informationsteknik, Funktionell programmering

Peter Ljunglöf

Chalmers, Data- och informationsteknik, Funktionell programmering

Studies in Computational Intelligence

1860-949X (ISSN) 1860-9503 (eISSN)

Vol. 939 105-138

Natural Language Processing in Artificial Intelligence, NLPinAI 2020 held within the 12th International Conference on Agents and Artificial Intelligence, ICAART 2020
Valletta, Malta,

Ämneskategorier

Annan data- och informationsvetenskap

Språkteknologi (språkvetenskaplig databehandling)

Datavetenskap (datalogi)

DOI

10.1007/978-3-030-63787-3_4

Mer information

Senast uppdaterat

2021-06-22