Michal Škrabal, Pavel Machač
[Articles]
Forvo.com: So many people, so many recordings
The article describes the Czech section of the crowdsourced audio dictionary available on the website forvo.com (2008–2021), which is remarkable for several reasons: for its scope, reach, linguistic diversity, and the very unique variability of pronunciation recorded. We compare the website with some other open multilingual databases of audio recordings and touch on the dichotomous relationship between the intended concept of the website and its actual form. We also briefly characterize the list of Czech entries and summarize the advantages and weaknesses of the available data for scientific purposes. Finally, we consider the typical user of the website, either a provider of audio data (speaker), whose speech behaviour is obviously influenced by the specific speech situation during the recording, or a non-native lay recipient (listener), who is fully dependent on the confidence in the representativeness of the specific pronunciation variants. Finally, we define the notion of representativeness, which will later, in our further study, serve as an evaluation framework for the phonetic analysis of the recordings.
Key words: audio pronunciation dictionary, citizen science, crowdsourcing, forvo.com
Klíčová slova: crowdsourcing, forvo.com, občanská věda, zvukový výslovnostní slovník
Text je on-line k dispozici v databázi CEEOL.
Michal Škrabal
Ústav Českého národního korpusu FF UK
Panská 890/7, 110 00 Praha 1
michal.skrabal@ff.cuni.cz
Pavel Machač
Ústav obecné lingvistiky FF UK
náměstí Jana Palacha 1/2, 116 38 Praha 1
pavel.machac@ff.cuni.cz
Naše řeč, volume 104 (2021), issue 3
Previous Z jazykové poradny
Next Barbora Martinkovičová: Kodifikace adjektiv odvozených od jmen českých obcí zakončených na konsonant + -ky, -ka, -ko