research data . Dataset . 2019 . Embargo end date: 15 Jul 2019

A Speech Test Set of Practice Business Presentations with Additional Relevant Texts

Macháček, Dominik; Kratochvíl, Jonáš; Vojtěchová, Tereza; Bojar, Ondřej;
Open Access
  • Published: 13 Jul 2019
  • Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Abstract
We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic speech recognition (ASR) systems, especially in conditions where the prior availability of in-domain vocabulary and named entities is benefitable. The corpus consists of 39 presentations in English, each up to 90 seconds long, and slides and web-pages in Czech, Slovak, English, German, Romanian, Italian or Spanish. The speakers are high school students from European countries with English as their second language. We benchmark three baseline ASR systems on the corpus...
Subjects
ACM Computing Classification System: InformationSystems_INFORMATIONSTORAGEANDRETRIEVALComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Funded by
EC| ELITR
Project
ELITR
European Live Translator
  • Funder: European Commission (EC)
  • Project Code: 825460
  • Funding stream: H2020 | RIA
Communities
CLARIN
Digital Humanities and Cultural Heritage
Any information missing or wrong?Report an Issue