Quick search
Advanced search in
Research outcomes
Field to searchTerm
Add rule
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
Download Results
68 research outcomes, page 1 of 7
  • research data . 2022 . Embargo End Date: 20 Apr 2022
    Open Access
    Authors:
    Aires, João Paulo;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | Bergamot (825303)

    Document-level testsuite for evaluation of gender translation consistency. Our Document-Level test set consists of selected English documents from the WMT21 newstest annotated with gender information. Czech unnanotated references are also added for convenience. We semi-...

    Add to ORCIDorcid
  • research data . 2022 . Embargo End Date: 06 Apr 2022
    Open Access
    Authors:
    Nedoluzhko, Anna; Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; Zeldes, Amir; Zeman, Daniel; Bourgonje, Peter; Cinková, Silvie; Hajič, Jan; Hardmeier, Christian; ...
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | Bergamot (825303)

    CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 1.0 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morpho...

    Add to ORCIDorcid
  • research data . 2022 . Embargo End Date: 26 May 2022
    Open Access
    Authors:
    Nedoluzhko, Anna; Singh, Muskaan; Hledíková, Marie; Tirthankar, Ghosal; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ELITR (825460)

    ELITR Minuting Corpus consists of transcripts of meetings in Czech and English, their manually created summaries ("minutes") and manual alignments between the two. Czech meetings are in the computer science and public administration domains and English meetings are in t...

    Add to ORCIDorcid
  • research data . 2022 . Embargo End Date: 09 Mar 2022
    Open Access
    Authors:
    Bhattacharya, Sunit; Kloudová, Věra; Zouhar, Vilém; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | Bergamot (825303)

    Eyetracked Multi-Modal Translation (EMMT) is a simultaneous eye-tracking, 4-electrode EEG and audio corpus for multi-modal reading and translation scenarios. It contains monocular eye movement recordings, audio data and 4-electrode wearable electroencephalogram (EEG) da...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 14 Dec 2021
    Open Access
    Authors:
    Nedoluzhko, Anna; Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; Zeman, Daniel;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | Bergamot (825303)

    CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 0.2 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morpho...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 18 Jun 2021
    Open Access
    Authors:
    Macháček, Dominik; Žilinec, Matúš; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ELITR (825460)

    ESIC (Europarl Simultaneous Interpreting Corpus) is a corpus of 370 speeches (10 hours) in English, with manual transcripts, transcribed simultaneous interpreting into Czech and German, and parallel translations. The corpus contains source English videos and audios. The...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 28 Jun 2021
    Open Access
    Authors:
    Kopp, Matyáš; Stankov, Vladislav; Bojar, Ondřej; Hladká, Barbora; Straňák, Pavel;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ELITR (825460)

    The ParCzech 3.0 corpus is the third version of ParCzech consisting of stenographic protocols that record the Chamber of Deputies’ meetings held in the 7th term (2013-2017) and the current 8th term (2017-Mar 2021). The protocols are provided in their original HTML forma...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 24 May 2021
    Open Access
    Authors:
    Novák, Michal; Zouhar, Vilém; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | Bergamot (825303)

    The dataset used for the Ptakopět experiment on outbound machine translation. It consists of screenshots of web forms with user queries entered. The queries are available also in a text form. The dataset comprises two language versions: English and Czech. Whereas the En...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 11 Mar 2021
    Open Access
    Authors:
    Nedoluzhko, Anna; Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; Zeman, Daniel;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | Bergamot (825303)

    CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 0.1 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morpho...

    Add to ORCIDorcid
  • research data . 2020 . Embargo End Date: 02 Jul 2020
    Open Access
    Authors:
    Çano, Erion;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ELITR (825460)

    OAGL is a paper metadata dataset consisting of 17528680 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last field of each record is the page length of the corresponding publication. ...

    Add to ORCIDorcid
68 research outcomes, page 1 of 7