Sampling and genealogical coverage in WALS
Artikel i vetenskaplig tidskrift, 2009

WALS was designed with the goal of providing a "systematic answer" to questions about the geographical distribution of language features. In order to achieve this goal, there must be an adequate sample of the world's languages included in WALS. In this article we investigate to what extent WALS fulfils its aim of maximizing the genealogical diversity of the samples of languages included. For this we look at the core-200 sample (included on almost all maps) as well as the 1,370 sample for the feature OV/VO word order (the sample with the largest number of languages). The genealogical diversity in these samples is compared against a database of "what could have been done", i.e., a database of which language families have adequate descriptive resources for the task at hand. In the 200 sample, we find a highly significant overinclusion of Eurasian languages at the expense of South American and Papuan languages. In the 1,370 sample, we find a highly significant overinclusion of North American languages at the expense of South American and Papuan languages. It follows that statistics based on these WALS samples cannot be used straightforwardly for sound inferences about the distribution of the features in question.

Word order

Linguistic atlas

Genealogical classification

Sampling

Författare

Harald Hammarström

Chalmers, Data- och informationsteknik, Datavetenskap

Linguistic Typology

1430-0532 (ISSN) 1613-415X (eISSN)

Vol. 13 1 105-119

Ämneskategorier

Data- och informationsvetenskap

DOI

10.1515/LITY.2009.006

Mer information

Skapat

2017-12-06