Quick search
Advanced search in
Research outcomes
Field to searchTerm
Add rule
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
Download Results
37 research outcomes, page 1 of 4
  • research data . 2020 . Embargo End Date: 06 Apr 2021
    Open Access
    Authors:
    Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko; Pollak, Senja;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | EMBEDDIA (825153)

    The list of single-word occupations in Slovene is based on the Slovene Standard Classification of Occupations (https://www.uradni-list.si/glasilo-uradni-list-rs/vsebina?urlid=199728&stevilka=1641). The list includes 234 occupation pairs. For each occupation, it contains...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 24 Sep 2020
    Open Access
    Authors:
    Pelicon, Andraž; Pranjić, Marko; Miljković, Dragana; Škrlj, Blaž; Pollak, Senja;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | EMBEDDIA (825153)

    We present a collection of sentiment annotations for news articles (article links) in Croatian language. A set of 2025 news articles was gathered from 24sata, one of the leading media companies in Croatia with the highest circulation. 6 annotators annotated the articles...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 24 Sep 2020
    Open Access
    Authors:
    Pollak, Senja; Arhar Holdt, Špela; Krek, Simon; Robnik-Šikonja, Marko;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | EMBEDDIA (825153)

    The reference list of Slovene most frequent common words was prepared by selecting vocabulary at the intersection of the most frequent 10,000 lemmas of four Slovene text corpora: the balanced reference corpus of written Slovene Kres, the reference corpus of spoken Slove...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 02 Sep 2020
    Open Access
    Authors:
    Vuković, Teodora;
    Persistent Identifiers
    Publisher: Slavisches Seminar, University of Zurich

    Torlak corpus represents a spoken variety of the endangered Torlak dialect from the Timok area in Southeast Serbia. It comprises transcripts of interviews with the local population, collected in the field between 2015 and 2017. Semi-structured interviews were conducted ...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 23 Sep 2020
    Open Access
    Authors:
    Škvorc, Tadej; Gantar, Polona; Robnik-Šikonja, Marko;
    Persistent Identifiers
    Publisher: Faculty of Computer and Information Science, University of Ljubljana
    Project: EC | EMBEDDIA (825153)

    SloIE is a manually labelled dataset of Slovene idiomatic expressions. It contains 29,400 sentences with 75 different expressions that can occur with either a literal or an idiomatic meaning, with appropriate manual annotations for each token. The idiomatic expressions ...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 09 Mar 2021
    Open Access
    Authors:
    Pollak, Senja; Vulić, Ivan; Pelicon, Andraž; Repar, Andraž; Armendariz, Carlos; Matthew, Purver; Ljubešić, Nikola;
    Persistent Identifiers
    Publisher: University of Ljubljana
    Project: EC | EMBEDDIA (825153)

    The resource contains English SimLex-999 (Hill et al. 2015) and their Slovene translations. In the translation process, the word pairs were first translated by two translators independently, and next, for the examples where the translations differed, the final translati...

    Add to ORCID
  • research data . 2020 . Embargo End Date: 30 Oct 2020
    Open Access
    Authors:
    Armendariz, Carlos; Matthew, Purver; Ulčar, Matej; Pollak, Senja; Ljubešić, Nikola; Robnik-Šikonja, Marko; Granroth-Wilding, Mark; Vaik, Kristiina;
    Persistent Identifiers
    Publisher: Queen Mary University
    Project: EC | EMBEDDIA (825153)

    The dataset contains human similarity ratings for pairs of words. The annotators were presented with contexts that contained both of the words in the pair and the dataset features two different contexts per pair. The words were sourced from the English, Croatian, Finnis...

    Add to ORCID
  • research data . 2019 . Embargo End Date: 25 Nov 2019
    Open Access
    Authors:
    Ulčar, Matej;
    Persistent Identifiers
    Publisher: Faculty of Computer and Information Science, University of Ljubljana
    Project: EC | EMBEDDIA (825153)

    ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian, Finnish, Estonian, Latvian, Lithuanian and Swedish. Each language's model was trained for appr...

    Add to ORCID
  • research data . 2019 . Embargo End Date: 25 Nov 2019
    Open Access
    Authors:
    Ulčar, Matej; Vaik, Kristiina; Lindström, Jessica; Linde, Dace; Dailidėnaitė, Milda; Šumakov, Andrei;
    Persistent Identifiers
    Publisher: Faculty of Computer and Information Science, University of Ljubljana
    Project: EC | EMBEDDIA (825153)

    Word analogy task evaluates word embeddings, based on analagous word pairs (eg. "Paris - France" should be equivalent to "Rome - Italy", "son - daughter" should be equivalent to "brother - sister"). The dataset has been inspired by Mikolov's analogy test set in English ...

    Add to ORCID
  • research data . 2019 . Embargo End Date: 15 Oct 2019
    Open Access
    Authors:
    Ulčar, Matej;
    Persistent Identifiers
    Publisher: Faculty of Computer and Information Science, University of Ljubljana
    Project: EC | EMBEDDIA (825153)

    ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus (https://viri.cjvt.si/gigafida/System/Impressum) for 10 epochs. 1,364,064 most common tokens were provided as vocabulary during the...

    Add to ORCID
37 research outcomes, page 1 of 4