Quick search
Advanced search in
Research outcomes
Field to searchTerm
Add rule
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
Download Results
40 research outcomes, page 1 of 4
  • research data . 2016 . Embargo End Date: 29 May 2016
    Open Access
    Authors:
    Popović, Maja; Arčan, Mihael;
    Persistent Identifiers
    Publisher: Insight Centre for Data Analytics, National University of Ireland, Galway
    Project: EC | TraMOOC (644333)

    The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. The main ad...

    Add to ORCIDorcid
  • research data . 2020 . Embargo End Date: 30 Oct 2020
    Open Access
    Authors:
    Armendariz, Carlos; Matthew, Purver; Ulčar, Matej; Pollak, Senja; Ljubešić, Nikola; Robnik-Šikonja, Marko; Granroth-Wilding, Mark; Vaik, Kristiina;
    Persistent Identifiers
    Publisher: Queen Mary University
    Project: EC | EMBEDDIA (825153)

    The dataset contains human similarity ratings for pairs of words. The annotators were presented with contexts that contained both of the words in the pair and the dataset features two different contexts per pair. The words were sourced from the English, Croatian, Finnis...

    Add to ORCIDorcid
  • research data . 2016 . Embargo End Date: 23 Jun 2016
    Open Access
    Authors:
    Ljubešić, Nikola;
    Persistent Identifiers
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Project: EC | ABU-MATRAN (324414)

    srLex is a large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the srWaC v1.2 corpus. The MSD tagset follows the M...

    Add to ORCIDorcid
  • research data . 2019 . Embargo End Date: 15 Oct 2019
    Open Access
    Authors:
    Ulčar, Matej;
    Persistent Identifiers
    Publisher: Faculty of Computer and Information Science, University of Ljubljana
    Project: EC | EMBEDDIA (825153)

    ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus (https://viri.cjvt.si/gigafida/System/Impressum) for 10 epochs. 1,364,064 most common tokens were provided as vocabulary during the...

    Add to ORCIDorcid
  • research data . 2017 . Embargo End Date: 21 Jun 2017
    Open Access
    Authors:
    Dobrišek, Simon; Žganec Gros, Jerneja; Žibert, Janez; Mihelič, France; Pavešić, Nikola;
    Persistent Identifiers
    Publisher: Faculty of Electrical Engineering, University of Ljubljana
    Project: EC | FLUINHIBIT (201634)

    The SOFES speech database (Spoken Flight Enquiries in Slovene) is a collection of transcribed and segmented audio recordings of spoken flight-information enquiries in Slovene. SOFES is built on the basis of the GOPOLIS speech database, which was acquired and compiled by...

    Add to ORCIDorcid
  • research data . 2021 . Embargo End Date: 19 May 2021
    Open Access
    Authors:
    Purver, Matthew; Shekhar, Ravi; Pranjić, Marko; Pollak, Senja; Martinc, Matej;
    Persistent Identifiers
    Publisher: Styria Media Group
    Project: EC | EMBEDDIA (825153)

    The 24sata news portal consists of a portal with daily news and several smaller portals covering news from specific topics, such as automotive news, health, culinary content, and lifestyle advice. The dataset contains over 650,000 articles in Croatian from 2007 to 2019,...

    Add to ORCIDorcid
  • research data . 2016 . Embargo End Date: 09 Mar 2016
    Restricted
    Authors:
    Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio; Klubička, Filip; Toral, Antonio;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | ABU-MATRAN (324414)

    The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-level domain for Croatia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawl...

    Add to ORCIDorcid
  • research data . 2016 . Embargo End Date: 25 Apr 2016
    Open Access
    Authors:
    Mozetič, Igor; Grčar, Miha; Smailović, Jasmina;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | SIMPOL (610704)

    The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate Twitter sentiment classifiers, to compute annotator...

    Add to ORCIDorcid
  • research data . 2016 . Embargo End Date: 05 Mar 2016
    Open Access
    Authors:
    Ljubešić, Nikola; Klubička, Filip;
    Persistent Identifiers
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Project: EC | ABU-MATRAN (324414)

    hrLex is an large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD) triple. The MSD tagset follows the revised MULTEXT-East V4 tagset for Croatian and Serbian, available at https://github.com/ffnlp/sethr/blob/master/mte4r-upo...

    Add to ORCIDorcid
  • research data . 2014 . Embargo End Date: 22 May 2015
    Open Access
    Authors:
    Erjavec, Tomaž;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | IMPACT (215064)

    The IMP digital library contains historical Slovene books and other publications, together 658 texts with over 45,000 pages from the period 1584-1919. Each text contains extensive meta-data, per-page links to facsimiles, and hand-corrected transcriptions with structural...

    Add to ORCIDorcid
40 research outcomes, page 1 of 4