research data . Dataset . 2021 . Embargo end date: 18 Jun 2021

ESIC 1.0 -- Europarl Simultaneous Interpreting Corpus

Macháček, Dominik; Žilinec, Matúš; Bojar, Ondřej;
Open Access
  • Published: 14 Jun 2021
  • Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
ESIC (Europarl Simultaneous Interpreting Corpus) is a corpus of 370 speeches (10 hours) in English, with manual transcripts, transcribed simultaneous interpreting into Czech and German, and parallel translations. The corpus contains source English videos and audios. The interpreters' voices are not published within the corpus, but there is a tool that downloads them from the web of European Parliament, where they are publicly avaiable. The transcripts are equipped with metadata (disfluencies, mixing voices and languages, read or spontaneous speech, etc.), punctuated, and with word-level timestamps. The speeches in the corpus come from the European Parliament ple...
Persistent Identifiers
Funded by
European Live Translator
  • Funder: European Commission (EC)
  • Project Code: 825460
  • Funding stream: H2020 | RIA
Digital Humanities and Cultural Heritage
Download from
Any information missing or wrong?Report an Issue