Quick search
Advanced search in
Research outcomes
Field to searchTerm
Add rule
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
Download Results
55 research outcomes, page 1 of 6
  • research data . 2018 . Embargo End Date: 08 Aug 2019
    Restricted
    Authors:
    Turchi, Marco; Negri, Matteo; Chatterjee, Rajen;
    Persistent Identifiers
    Publisher: Fondazione Bruno Kessler, Trento, Italy
    Project: EC | QT21 (645452)

    Human post-edited and reference test sentences for the En-De PBSMT WMT 2018 Automatic post-editing task. This consists of 2,000 German sentences for each file belonging to the IT domain and already tokenized. All data is provided by the EU project QT21 (http://www.qt21....

    Add to ORCID
  • research data . 2016
    Open Access
    Authors:
    Springmann, Uwe; Fink, Florian;
    Publisher: Zenodo
    Project: EC | CLARIN (212230)

    <p>The 2-day CIS OCR Workshop on &quot;OCR and postcorrection of early printings for digital humanities&quot; originally held at LMU, Munich 14/15 September 2015 (see http://www.cis.lmu.de/ocrworkshop).</p> <p>Release date: 2016-02-25</p> <p><br /> CIS OCR Workshop by U...

    Add to ORCID
  • research data . 2019 . Embargo End Date: 15 Jul 2019
    Open Access
    Authors:
    Macháček, Dominik; Kratochvíl, Jonáš; Vojtěchová, Tereza; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ELITR (825460)

    We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic speech recognition (ASR) systems, especially in conditions where the pr...

    Add to ORCID
  • research data . 2017 . Embargo End Date: 27 Feb 2017
    Restricted
    Authors:
    Specia, Lucia; Logacheva, Varvara;
    Persistent Identifiers
    Publisher: University of Sheffield
    Project: EC | QT21 (645452)

    Training and development data for the WMT17 QE task. Test data will be published as a separate item. This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, with...

    Add to ORCID
  • research data . 2019 . Embargo End Date: 12 Sep 2019
    Open Access
    Authors:
    Çano, Erion;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ELITR (825460)

    OAGS is a title generation dataset consisting of 34993700 abstracts and titles from scientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are...

    Add to ORCID
  • research data . 2017 . Embargo End Date: 03 Apr 2017
    Open Access
    Authors:
    Pecina, Pavel; Dušek, Ondřej; Hajič, Jan; Libovický, Jindřich; Urešová, Zdeňka;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | KHRESMOI (257528)

    This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans Swedish. The queries come from general public and medical experts. This is version 2.0 extending ...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 19 Jun 2020
    Open Access
    Authors:
    Barančíková, Petra; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | Bergamot (825303)

    Costra 1.1 is a new dataset for testing geometric properties of sentence embeddings spaces. In particular, it concentrates on examining how well sentence embeddings capture complex phenomena such paraphrases, tense or generalization. The dataset is a direct expansion of...

    Add to ORCID
  • research data . 2013 . Embargo End Date: 02 Apr 2014
    Open Access
    Authors:
    Pecina, Pavel; Dušek, Ondřej; Hajič, Jan; Urešová, Zdeňka;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | KHRESMOI (257528)

    This package contains data sets for development and testing of machine translation of medical search short queries between Czech, English, French, and German. The queries come from general public and medical experts.

    Add to ORCID
  • research data . 2016 . Embargo End Date: 22 Mar 2016
    Open Access
    Authors:
    Kamran, Amir; Jawaid, Bushra; Bojar, Ondřej; Stanojevic, Milos;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | QT21 (645452)

    This item contains models to tune for the WMT16 Tuning shared task for English-to-Czech. CzEng 1.6pre (http://ufal.mff.cuni.cz/czeng/czeng16pre) corpus is used for the training of the translation models. The data is tokenized (using Moses tokenizer), lowercased and sent...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 21 Feb 2016
    Restricted
    Authors:
    Turchi, Marco; Chatterjee, Rajen; Negri, Matteo;
    Persistent Identifiers
    Publisher: Fondazione Bruno Kessler, Trento, Italy
    Project: EC | QT21 (645452)

    Training, development and text data (the same used for the Sentence-level Quality Estimation task) consist in English-German triplets (source, target and post-edit) belonging to the IT domain and already tokenized. Training and development respectively contain 12,000 an...

    Add to ORCID
55 research outcomes, page 1 of 6