research data . Dataset . 2021 . Embargo end date: 24 May 2021

Ptakopět data: the dataset for experiments on outbound translation

Novák, Michal; Zouhar, Vilém; Bojar, Ondřej;
Open Access
  • Published: 24 May 2021
  • Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Abstract
The dataset used for the Ptakopět experiment on outbound machine translation. It consists of screenshots of web forms with user queries entered. The queries are available also in a text form. The dataset comprises two language versions: English and Czech. Whereas the English version has been fully post-processed (screenshots cropped, queries within the screenshots highlighted, dataset split based on its quality etc.), the Czech version is raw as it was collected by the annotators.
Persistent Identifiers
Subjects
ACM Computing Classification System: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
Funded by
EC| Bergamot
Project
Bergamot
Browser-based Multilingual Translation
  • Funder: European Commission (EC)
  • Project Code: 825303
  • Funding stream: H2020 | RIA
Communities
CLARIN
Digital Humanities and Cultural Heritage
Download from
Any information missing or wrong?Report an Issue