Learning Domain-Specific Grammars from a Small Number of Examples

Herbert Lange; Peter Ljunglöf

doi:10.5220/0009371304220430

Learning Domain-Specific Grammars from a Small Number of Examples
Paper i proceeding, 2020

In this chapter we investigate the problem of grammar learning from a perspective that diverges from previous approaches. These prevailing approaches to learning grammars usually attempt to infer a grammar directly from example corpora without any additional information. This either requires a large training set or suffers from bad accuracy. We instead view learning grammars as a problem of grammar restriction or subgrammar extraction. We start from a large-scale grammar (called a resource grammar) and a small number of example sentences, and find a subgrammar that still covers all the examples. To accomplish this, we formulate the problem as a constraint satisfaction problem, and use a constraint solver to find the optimal grammar. We created experiments with English, Finnish, German, Swedish, and Spanish, which show that 10–20 examples are often sufficient to learn an interesting grammar for a specific application. We also present two extensions to this basic method: we include negative examples and allow rules to be merged. The resulting grammars can more precisely cover specific linguistic phenomena. Our method, together with the extensions, can be used to provide a grammar learning system for specific applications. This system is easy-to-use, human-centric, and can be used by non-syntacticians. Based on this grammar learning method, we can build applications for computer-assisted language learning and interlingual communication, which rely heavily on the knowledge of language and domain experts who often lack the competence to develop required grammars themselves.

Domain-specific grammar

Grammar restriction

Grammar learning

Constraint satisfaction

Författare

Herbert Lange

Göteborgs universitet

Chalmers, Data- och informationsteknik, Funktionell programmering

Forskning Andra publikationer

Peter Ljunglöf

Chalmers, Data- och informationsteknik, Funktionell programmering

Göteborgs universitet

Forskning Andra publikationer

Studies in Computational Intelligence

1860-949X (ISSN) 1860-9503 (eISSN)

Vol. 1 105-138
9783030637866 (ISBN)

Natural Language Processing in Artificial Intelligence, NLPinAI 2020 held within the 12th International Conference on Agents and Artificial Intelligence, ICAART 2020
Valletta, Malta,

Ämneskategorier (SSIF 2011)

Annan data- och informationsvetenskap

Språkteknologi (språkvetenskaplig databehandling)

Datavetenskap (datalogi)

DOI

10.5220/0009371304220430

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2025-09-19

Learning Domain-Specific Grammars from a Small Number of Examples Paper i proceeding, 2020