Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
39 Research products, page 1 of 4

  • Digital Humanities and Cultural Heritage
  • Research data
  • Other research products
  • 2012-2021
  • European Commission
  • CLARIN.SI repository

10
arrow_drop_down
Relevance
arrow_drop_down
  • Research data . 2016 . Embargo End Date: 23 Jun 2016
    Open Access
    Authors: 
    Ljubešić, Nikola;
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Project: EC | ABU-MATRAN (324414)

    srLex is a large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the srWaC v1.2 corpus. The MSD tagset follows the M...

  • Research data . 2016 . Embargo End Date: 29 May 2016
    Open Access
    Authors: 
    Popović, Maja; Arčan, Mihael;
    Publisher: Insight Centre for Data Analytics, National University of Ireland, Galway
    Project: EC | TraMOOC (644333)

    The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. The main ad...

  • Research data . 2019 . Embargo End Date: 15 Oct 2019
    Open Access
    Authors: 
    Ulčar, Matej;
    Publisher: Faculty of Computer and Information Science, University of Ljubljana
    Project: EC | EMBEDDIA (825153)

    ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus (https://viri.cjvt.si/gigafida/System/Impressum) for 10 epochs. 1,364,064 most common tokens were provided as vocabulary during the...

  • Research data . 2020 . Embargo End Date: 30 Oct 2020
    Open Access
    Authors: 
    Armendariz, Carlos; Matthew, Purver; Ulčar, Matej; Pollak, Senja; Ljubešić, Nikola; Robnik-Šikonja, Marko; Granroth-Wilding, Mark; Vaik, Kristiina;
    Publisher: Queen Mary University
    Project: EC | EMBEDDIA (825153)

    The dataset contains human similarity ratings for pairs of words. The annotators were presented with contexts that contained both of the words in the pair and the dataset features two different contexts per pair. The words were sourced from the English, Croatian, Finnis...

  • Research data . 2021 . Embargo End Date: 19 May 2021
    Open Access
    Authors: 
    Purver, Matthew; Shekhar, Ravi; Pranjić, Marko; Pollak, Senja; Martinc, Matej;
    Publisher: Styria Media Group
    Project: EC | EMBEDDIA (825153)

    The 24sata news portal consists of a portal with daily news and several smaller portals covering news from specific topics, such as automotive news, health, culinary content, and lifestyle advice. The dataset contains over 650,000 articles in Croatian from 2007 to 2019,...

  • Research data . 2017 . Embargo End Date: 21 Jun 2017
    Open Access
    Authors: 
    Dobrišek, Simon; Žganec Gros, Jerneja; Žibert, Janez; Mihelič, France; Pavešić, Nikola;
    Publisher: Faculty of Electrical Engineering, University of Ljubljana
    Project: EC | FLUINHIBIT (201634)

    The SOFES speech database (Spoken Flight Enquiries in Slovene) is a collection of transcribed and segmented audio recordings of spoken flight-information enquiries in Slovene. SOFES is built on the basis of the GOPOLIS speech database, which was acquired and compiled by...

  • Research data . 2016 . Embargo End Date: 09 Mar 2016
    Restricted
    Authors: 
    Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio; Klubička, Filip; Toral, Antonio;
    Publisher: Jožef Stefan Institute
    Project: EC | ABU-MATRAN (324414)

    The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-level domain for Croatia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawl...

  • Research data . 2016 . Embargo End Date: 25 Apr 2016
    Open Access
    Authors: 
    Mozetič, Igor; Grčar, Miha; Smailović, Jasmina;
    Publisher: Jožef Stefan Institute
    Project: EC | SIMPOL (610704)

    The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate Twitter sentiment classifiers, to compute annotator...

  • Research data . 2016 . Embargo End Date: 05 Mar 2016
    Open Access
    Authors: 
    Ljubešić, Nikola; Klubička, Filip;
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Project: EC | ABU-MATRAN (324414)

    hrLex is an large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD) triple. The MSD tagset follows the revised MULTEXT-East V4 tagset for Croatian and Serbian, available at https://github.com/ffnlp/sethr/blob/master/mte4r-upo...

  • Research data . 2014 . Embargo End Date: 22 May 2015
    Open Access
    Authors: 
    Erjavec, Tomaž;
    Publisher: Jožef Stefan Institute
    Project: EC | IMPACT (215064)

    The IMP digital library contains historical Slovene books and other publications, together 658 texts with over 45,000 pages from the period 1584-1919. Each text contains extensive meta-data, per-page links to facsimiles, and hand-corrected transcriptions with structural...