Quick search
Advanced search in
Research outcomes
Field to searchTerm
Add rule
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
Download Results
19 research outcomes, page 1 of 2
  • research data . 2021 . Embargo End Date: 04 Jun 2021
    Open Access
    Authors:
    Koloski, Boshko; Pollak, Senja; Škrlj, Blaž; Martinc, Matej;
    Persistent Identifiers
    Publisher: Ekspress Meedia Group
    Project: EC | EMBEDDIA (825153)

    EACL Hackashop Keyword Challenge Datasets In this repository you can find ids of articles used for the keyword extraction challenge at EACL Hackashop on News Media Content Analysis and Automated Report Generation (http://embeddia.eu/hackashop2021/). The article ids can ...

    Add to ORCID
  • research data . 2021 . Embargo End Date: 19 May 2021
    Open Access
    Authors:
    Purver, Matthew; Shekhar, Ravi; Pranjić, Marko; Pollak, Senja; Martinc, Matej;
    Persistent Identifiers
    Publisher: Styria Media Group
    Project: EC | EMBEDDIA (825153)

    The 24sata news portal consists of a portal with daily news and several smaller portals covering news from specific topics, such as automotive news, health, culinary content, and lifestyle advice. The dataset contains over 650,000 articles in Croatian from 2007 to 2019,...

    Add to ORCID
  • research data . 2021 . Embargo End Date: 24 May 2021
    Open Access
    Authors:
    Purver, Matthew; Pollak, Senja; Freienthal, Linda; Kuulmets, Hele-Andra; Krustok, Ivar; Shekhar, Ravi;
    Persistent Identifiers
    Publisher: Ekspress Meedia Group
    Project: EC | EMBEDDIA (825153)

    The dataset is an archive of articles from the Ekspress Meedia news site from 2009-2019, containing over 1.4M articles, mostly in Estonian language (1,115,120 articles) with some in Russian (325,952 articles). Keywords are included for articles after 2015. The main arch...

    Add to ORCID
  • research data . 2021 . Embargo End Date: 24 May 2021
    Open Access
    Authors:
    Shekhar, Ravi; Purver, Matthew; Pollak, Senja; Pelicon, Andraž; Krustok, Ivar;
    Persistent Identifiers
    Publisher: Ekspress Meedia Group
    Project: EC | EMBEDDIA (825153)

    The dataset is an archive of reader comments from the Delfi news site from 2014-2019, containing approximately 12M comments, mostly in the Latvian language, with some in Russian. Description of the Datasets There are 6 CSV files: * ``lv-comments-2014.csv`` contains **2 ...

    Add to ORCID
  • research data . 2021 . Embargo End Date: 24 May 2021
    Open Access
    Authors:
    Pollak, Senja; Purver, Matthew; Shekhar, Ravi; Freienthal, Linda; Kuulmets, Hele-Andra; Krustok, Ivar;
    Persistent Identifiers
    Publisher: Ekspress Meedia Group
    Project: EC | EMBEDDIA (825153)

    This dataset is an archive of articles from the Delfi news site from 2015-2019, containing over 180,000 articles (c. 50% in Latvian and 50% in the Russian language). Keywords for articles are included. There are 5 JSON files: lv_2015.json contains 42 001 articles from t...

    Add to ORCID
  • research data . 2021 . Embargo End Date: 24 May 2021
    Open Access
    Authors:
    Shekhar, Ravi; Pollak, Senja; Pelicon, Andraž; Matthew, Purver; Krustok, Ivar;
    Persistent Identifiers
    Publisher: Ekspress Meedia Group
    Project: EC | EMBEDDIA (825153)

    This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian. Description of the Datasets. There are 11 CSV files: comments_2009.csv contains 2...

    Add to ORCID
  • research data . 2021 . Embargo End Date: 24 May 2021
    Open Access
    Authors:
    Shekhar, Ravi; Pranjic, Marko; Pollak, Senja; Pelicon, Andraž; Purver, Matthew;
    Persistent Identifiers
    Publisher: Styria Media Group
    Project: EC | EMBEDDIA (825153)

    The dataset of user comments provided for research purposes for the EMBEDDIA, a Horizon 2020 project, extracted from the database of user comments from the 24sata.hr news portal. The 24sata.hr is the largest-circulation daily newspaper in Croatia, reaching on average 2 ...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 06 Apr 2021
    Open Access
    Authors:
    Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko; Pollak, Senja;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | EMBEDDIA (825153)

    The list of single-word occupations in Slovene is based on the Slovene Standard Classification of Occupations (https://www.uradni-list.si/glasilo-uradni-list-rs/vsebina?urlid=199728&stevilka=1641). The list includes 234 occupation pairs. For each occupation, it contains...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 24 Sep 2020
    Open Access
    Authors:
    Pelicon, Andraž; Pranjić, Marko; Miljković, Dragana; Škrlj, Blaž; Pollak, Senja;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | EMBEDDIA (825153)

    We present a collection of sentiment annotations for news articles (article links) in Croatian language. A set of 2025 news articles was gathered from 24sata, one of the leading media companies in Croatia with the highest circulation. 6 annotators annotated the articles...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 24 Sep 2020
    Open Access
    Authors:
    Pollak, Senja; Arhar Holdt, Špela; Krek, Simon; Robnik-Šikonja, Marko;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | EMBEDDIA (825153)

    The reference list of Slovene most frequent common words was prepared by selecting vocabulary at the intersection of the most frequent 10,000 lemmas of four Slovene text corpora: the balanced reference corpus of written Slovene Kres, the reference corpus of spoken Slove...

    Add to ORCID
19 research outcomes, page 1 of 2