Quick search
Advanced search in
Research outcomes
Field to searchTerm
Add rule
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
Download Results
59 research outcomes, page 1 of 6
  • research data . 2021 . Embargo End Date: 11 Mar 2021
    Open Access
    Authors:
    Nedoluzhko, Anna; Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; Zeman, Daniel;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | Bergamot (825303)

    CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 0.1 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morpho...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 02 Jul 2020
    Open Access
    Authors:
    Çano, Erion;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ELITR (825460)

    OAGL is a paper metadata dataset consisting of 17528680 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last field of each record is the page length of the corresponding publication. ...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 19 Jun 2020
    Open Access
    Authors:
    Barančíková, Petra; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | Bergamot (825303)

    Costra 1.1 is a new dataset for testing geometric properties of sentence embeddings spaces. In particular, it concentrates on examining how well sentence embeddings capture complex phenomena such paraphrases, tense or generalization. The dataset is a direct expansion of...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 16 Jul 2020
    Open Access
    Authors:
    Parida, Shantipriya; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ROXANNE (833635)

    Data ----- We have collected English-Odia parallel data for the purposes of NLP research of the Odia language. The data for the parallel corpus was extracted from existing parallel corpora such as OdiEnCorp 1.0 and PMIndia, and books which contain both English and Odia ...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 14 Aug 2020
    Open Access
    Authors:
    Parida, Shantipriya; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ROXANNE (833635)

    Data ---- Hindi Visual Genome 1.1 is an updated version of Hindi Visual Genome 1.0. The update concerns primarily the text part of Hindi Visual Genome, fixing translation issues reported during WAT 2019 multimodal task. In the image part, only one segment and thus one i...

    Add to ORCID
  • research data . 2019 . Embargo End Date: 05 Dec 2019
    Open Access
    Authors:
    Barančíková, Petra; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | Bergamot (825303)

    COSTRA 1.0 is a dataset of Czech complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard paraphrasing. The dataset consist of 4,262 unique sentences with average length of 10 words,...

    Add to ORCID
  • research data . 2019 . Embargo End Date: 31 Oct 2019
    Open Access
    Authors:
    Çano, Erion;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ELITR (825460)

    OAGSX is a title generation dataset consisting of 34408509 abstracts and titles from scientific articles. The texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples...

    Add to ORCID
  • research data . 2019 . Embargo End Date: 21 Oct 2019
    Open Access
    Authors:
    Çano, Erion;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ELITR (825460)

    OAGKX is a keyword extraction/generation dataset consisting of 22674436 abstracts, titles and keyword strings from scientific articles. The texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release ver...

    Add to ORCID
  • research data . 2019 . Embargo End Date: 12 Sep 2019
    Open Access
    Authors:
    Çano, Erion;
    Persistent Identifiers
    Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    Project: EC | ELITR (825460)

    OAGS is a title generation dataset consisting of 34993700 abstracts and titles from scientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are...

    Add to ORCID
  • research data . 2019
    Open Access
    Authors:
    Macháček, Dominik; Kratochvíl, Jonáš; Vojtěchová, Tereza; Bojar, Ondřej;
    Persistent Identifiers
    Publisher: Springer International Publishing
    Project: EC | ELITR (825460)

    We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic speech recognition (ASR) systems, especially in conditions where the pr...

    Add to ORCID
59 research outcomes, page 1 of 6