Časopis Naše řeč
en cz

Korpus a reprezentativnost

Jan Chromý

[Články]

(pdf)

Corpus and representativeness

This paper discusses the concept of representativeness in corpus linguistics. Representativeness is a concept used in empirical, quantitative science and it is a characteristic of the relationship between the sample and the population. It is argued that the population for the standard supposedly “representative” corpora of a whole language cannot be defined. The population could be reliably defined only for specialized corpora (e.g. corpora of newspaper texts), hence only this type of corpora could be truly statistically representative. The paper also discusses the idea that we could think about representativeness from the perspective of particular linguistic items instead of from the perspective of the whole language. It may be the case that the same corpus is representative for the use of one item and, at the same time, not representative for the use of another item.

Key words: corpus, representativeness, specialized corpora, population, inferential statistics
Klíčová slova: korpus, reprezentativnost, specializované korpusy, populace, inferenční statistika

Text je on-line k dispozici v databázi CEEOL.

Ústav českého jazyka a teorie komunikace FF UK
nám. Jana Palacha 2, 116 38 Praha 1
jan.chromy@ff.cuni.cz

Naše řeč, ročník 97 (2014), číslo 4–5, s. 185-193

Předchozí Radek Čech: Jen popis s čísly? Perspektivy korpusové lingvistiky

Následující Vladimír Petkevič: Problémy automatické morfologické disambiguace češtiny