Quick search
Advanced search in
Research outcomes
Field to searchTerm
Add rule
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
Download Results
44 research outcomes, page 4 of 5
  • research data . 2016 . Embargo End Date: 09 Mar 2016
    Restricted
    Authors:
    Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio; Klubička, Filip; Toral, Antonio;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | ABU-MATRAN (324414)

    The srenWaC corpus consists of sentence aligned parallel Serbian-English texts crawled from the .rs top-level domain for Serbia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for cr...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 05 Mar 2016
    Open Access
    Authors:
    Ljubešić, Nikola; Klubička, Filip;
    Persistent Identifiers
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Project: EC | ABU-MATRAN (324414)

    hrLex is an large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD) triple. The MSD tagset follows the revised MULTEXT-East V4 tagset for Croatian and Serbian, available at https://github.com/ffnlp/sethr/blob/master/mte4r-upo...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 05 Mar 2016
    Open Access
    Authors:
    Ljubešić, Nikola; Klubička, Filip;
    Persistent Identifiers
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Project: EC | ABU-MATRAN (324414)

    hrLex is an large inflectional lexicon of Croatian language where each entry consists of a (wordform, lemma, MSD) triple. The MSD tagset follows the revised MULTEXT-East V4 tagset for Croatian and Serbian, available at https://github.com/ffnlp/sethr/blob/master/mte4r-up...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 25 Apr 2016
    Open Access
    Authors:
    Mozetič, Igor; Grčar, Miha; Smailović, Jasmina;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | SIMPOL (610704)

    The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate Twitter sentiment classifiers, to compute annotator...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 29 Jan 2016
    Restricted
    Authors:
    Toral, Antonio; Esplà-Gomis, Miquel; Klubička, Filip; Ljubešić, Nikola; Papavassiliou, Vassilis; Prokopidis, Prokopis; Rubino, Raphael; Way, Andy;
    Persistent Identifiers
    Publisher: Abu-MaTran project
    Project: EC | ABU-MATRAN (324414)

    Sentence aligned parallel corpus built by automatically crawling 25 websites from the tourism domain.

    Add to ORCID
  • research data . 2016 . Embargo End Date: 12 Jul 2017
    Open Access
    Authors:
    Grčar, Miha; Cherepnalkoski, Darko; Mozetič, Igor; Kralj Novak, Petra;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | DOLFINS (640772)

    The corpus contains over 4.5 million tweets (tweet IDs) automatically labeled by a machine learning program with stance regarding Brexit: Positive (supporting Brexit), Negative (opposing Brexit), or Neutral (uncommitted). The Brexit referendum was held on June 23, 2016,...

    Add to ORCID
  • research data . 2015 . Embargo End Date: 14 Apr 2016
    Open Access
    Authors:
    Kralj Novak, Petra; Smailović, Jasmina; Sluban, Borut; Mozetič, Igor;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | SIMPOL (610704)

    A lexicon of 751 emoji characters with automatically assigned sentiment. The sentiment is computed from 70,000 tweets, labeled by 83 human annotators in 13 European languages. The process and analysis of emoji sentiment ranking is described in the paper: Kralj Novak P, ...

    Add to ORCID
  • research data . 2015 . Embargo End Date: 07 May 2015
    Open Access
    Authors:
    Erjavec, Tomaž;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | IMPACT (215064)

    goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899. Each text contains extensive meta-data and per-page links to facsimiles, while the word tokens in the ...

    Add to ORCID
  • research data . 2014 . Embargo End Date: 25 May 2015
    Open Access
    Authors:
    Erjavec, Tomaž;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | IMPACT (215064)

    The imp25k lexicon of historical Slovene was created automatically from the goo300k and foo3M annotated corpora and contains attested and manually verified word forms and their annotations with examples of use. A lexicon entry contains the modern lemma with its part-of-...

    Add to ORCID
  • research data . 2014 . Embargo End Date: 22 May 2015
    Open Access
    Authors:
    Erjavec, Tomaž;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | IMPACT (215064)

    The IMP digital library contains historical Slovene books and other publications, together 658 texts with over 45,000 pages from the period 1584-1919. Each text contains extensive meta-data, per-page links to facsimiles, and hand-corrected transcriptions with structural...

    Add to ORCID
44 research outcomes, page 4 of 5