Quick search
Advanced search in
Research outcomes
Field to searchTerm
Add rule
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
Download Results
49 research outcomes, page 1 of 5
  • research data . 2022 . Embargo End Date: 24 Feb 2022
    Open Access
    Authors:
    Freienthal, Linda; Pelicon, Andraž; Martinc, Matej; Škrlj, Blaž; Krustok, Ivar; Pranjić, Marko; Cabrera-Diego, Luis Adrián; Purver, Matthew; Pollak, Senja; Kuulmets, Hele-Andra; ...
    Persistent Identifiers
    Publisher: Ekspress Meedia Group
    Project: EC | EMBEDDIA (825153)

    This dataset contains articles from EMBEDDIA Media partners with various information added by the tools developed within the EMBEDDIA project: - 12,390 Estonian articles from 2019 with tags given by Ekspress Meedia. The complete dataset without the output of EMBEDDIA to...

    Add to ORCIDorcid
  • research data . 2022 . Embargo End Date: 10 Feb 2022
    Restricted
    Authors:
    Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko; Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Ferme, Marko; Borovič, Mladen; Boškovič, Borko; Ojsteršek, Milan; ...
    Persistent Identifiers
    Publisher: Faculty of Electrical Engineering and Computer Science, University of Maribor
    Project: EC | EMBEDDIA (825153)

    The Machine Translation datasets KAS-MT 1.0 contain automatically sentence-aligned Slovene and English plain-text abstracts from KAS-Abs 2.0 (http://hdl.handle.net/11356/1449) and is meant for studies in machine translation. The setence alignment approach used requires ...

    Add to ORCIDorcid
  • research data . 2022 . Embargo End Date: 10 Feb 2022
    Restricted
    Authors:
    Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko; Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Ferme, Marko; Borovič, Mladen; Boškovič, Borko; Ojsteršek, Milan; ...
    Persistent Identifiers
    Publisher: Faculty of Electrical Engineering and Computer Science, University of Maribor
    Project: EC | EMBEDDIA (825153)

    Summarization datasets were created from the text bodies in the KAS 2.0 corpus (http://hdl.handle.net/11356/1448) and the abstracts from the KAS-Abs 2.0 corpus (http://hdl.handle.net/11356/1449). The monolingual slo2slo dataset contains 69,730 Slovene abstracts and Slov...

    Add to ORCIDorcid
  • research data . 2022 . Embargo End Date: 10 Feb 2022
    Restricted
    Authors:
    Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko; Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Ferme, Marko; Borovič, Mladen; Boškovič, Borko; Ojsteršek, Milan; ...
    Persistent Identifiers
    Publisher: Faculty of Electrical Engineering and Computer Science, University of Maribor
    Project: EC | EMBEDDIA (825153)

    The KAS-abs 2.0 corpus contains 125,202 automatically identified Slovenian and/or English abstracts from BSc/BA, MSc/MA, and PhD theses included in the KAS Corpus of Academic Slovene 2.0 (http://hdl.handle.net/11356/1448). The abstracts are either in Slovenian (*-abs-sl...

    Add to ORCIDorcid
  • research data . 2022 . Embargo End Date: 04 Feb 2022
    Restricted
    Authors:
    Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko; Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Ferme, Marko; Borovič, Mladen; Boškovič, Borko; Ojsteršek, Milan; ...
    Persistent Identifiers
    Publisher: Faculty of Electrical Engineering and Computer Science, University of Maribor
    Project: EC | EMBEDDIA (825153)

    The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600 PhD theses (82 thousand texts, 5 million pages or 1,5 billion tokens) written 2000 - 2018 and gathered from the digital libraries of Slovene higher education institution...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 04 Jun 2021
    Open Access
    Authors:
    Koloski, Boshko; Pollak, Senja; Škrlj, Blaž; Martinc, Matej;
    Persistent Identifiers
    Publisher: Ekspress Meedia Group
    Project: EC | EMBEDDIA (825153)

    EACL Hackashop Keyword Challenge Datasets In this repository you can find ids of articles used for the keyword extraction challenge at EACL Hackashop on News Media Content Analysis and Automated Report Generation (http://embeddia.eu/hackashop2021/). The article ids can ...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 24 May 2021
    Open Access
    Authors:
    Shekhar, Ravi; Purver, Matthew; Pollak, Senja; Pelicon, Andraž; Krustok, Ivar;
    Persistent Identifiers
    Publisher: Ekspress Meedia Group
    Project: EC | EMBEDDIA (825153)

    The dataset is an archive of reader comments from the Delfi news site from 2014-2019, containing approximately 12M comments, mostly in the Latvian language, with some in Russian. Description of the Datasets There are 6 CSV files: * ``lv-comments-2014.csv`` contains **2 ...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 24 May 2021
    Open Access
    Authors:
    Pollak, Senja; Purver, Matthew; Shekhar, Ravi; Freienthal, Linda; Kuulmets, Hele-Andra; Krustok, Ivar;
    Persistent Identifiers
    Publisher: Ekspress Meedia Group
    Project: EC | EMBEDDIA (825153)

    This dataset is an archive of articles from the Delfi news site from 2015-2019, containing over 180,000 articles (c. 50% in Latvian and 50% in the Russian language). Keywords for articles are included. There are 5 JSON files: lv_2015.json contains 42 001 articles from t...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 24 May 2021
    Open Access
    Authors:
    Shekhar, Ravi; Pollak, Senja; Pelicon, Andraž; Matthew, Purver; Krustok, Ivar;
    Persistent Identifiers
    Publisher: Ekspress Meedia Group
    Project: EC | EMBEDDIA (825153)

    This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian. Description of the Datasets. There are 11 CSV files: comments_2009.csv contains 2...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 24 May 2021
    Open Access
    Authors:
    Shekhar, Ravi; Pranjic, Marko; Pollak, Senja; Pelicon, Andraž; Purver, Matthew;
    Persistent Identifiers
    Publisher: Styria Media Group
    Project: EC | EMBEDDIA (825153)

    The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation daily newspaper in Croatia, reaching on average 2 ...

    Add to ORCIDorcid
49 research outcomes, page 1 of 5