Quick search
Advanced search in
Research outcomes
Field to searchTerm
Add rule
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
Download Results
44 research outcomes, page 3 of 5
  • research data . 2016 . Embargo End Date: 26 Sep 2016
    Open Access
    Authors:
    Ljubešić, Nikola; Pirinen, Tommi; Toral, Antonio;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | ABU-MATRAN (324414)

    The Finnish web corpus fiWaC was built by crawling the .fi top-level domain in 2015 for both Finnish and English documents. The corpus was naively tokenised (via spaces), near-deduplicated on paragraph level and paragraph-shuffled. Each paragraph contains metadata on th...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 19 Sep 2016
    Open Access
    Authors:
    Ljubešić, Nikola; Klubička, Filip; Boras, Damir;
    Persistent Identifiers
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Project: EC | ABU-MATRAN (324414)

    srLex is a large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the srWaC v1.2 corpus. The MSD tagset follows the M...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 19 Sep 2016
    Open Access
    Authors:
    Ljubešić, Nikola; Klubička, Filip; Boras, Damir;
    Persistent Identifiers
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Project: EC | ABU-MATRAN (324414)

    hrLex is a large inflectional lexicon of Croatian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the hrWaC v2.2 corpus. The MSD tagset follows the ...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 05 Aug 2016
    Open Access
    Authors:
    Cherepnalkoski, Darko; Karpf, Andreas; Mozetič, Igor; Grčar, Miha;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | DOLFINS (640772)

    The resource consists of two datasets related to Members of the 8th European Parliament (MEPs). The first one is a dataset of 2,535 roll-call votes of MEPs until 2016-03-01. The second one is a dataset of 26,133 retweets between MEPs in the period between 2014-10-01 and...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 23 Jun 2016
    Open Access
    Authors:
    Ljubešić, Nikola;
    Persistent Identifiers
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Project: EC | ABU-MATRAN (324414)

    srLex is a large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the srWaC v1.2 corpus. The MSD tagset follows the M...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 24 Jun 2016
    Open Access
    Authors:
    Ljubešić, Nikola;
    Persistent Identifiers
    Publisher: Faculty of Humanities and Social Sciences, University of Zagreb
    Project: EC | ABU-MATRAN (324414)

    hrLex is a large inflectional lexicon of Croatian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the hrWaC v2.2 corpus. The MSD tagset follows the ...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 29 May 2016
    Open Access
    Authors:
    Popović, Maja; Arčan, Mihael;
    Persistent Identifiers
    Publisher: Insight Centre for Data Analytics, National University of Ireland, Galway
    Project: EC | TraMOOC (644333)

    The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. The main ad...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 10 Mar 2016
    Restricted
    Authors:
    Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio; Klubička, Filip; Toral, Antonio;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | ABU-MATRAN (324414)

    The slenWaC corpus version 1.0 consists of parallel Slovene-English texts crawled from the .si top-level domain for Slovenia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawl...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 09 Mar 2016
    Restricted
    Authors:
    Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio; Klubička, Filip; Toral, Antonio;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | ABU-MATRAN (324414)

    The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-level domain for Croatia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawl...

    Add to ORCID
  • research data . 2016 . Embargo End Date: 09 Mar 2016
    Restricted
    Authors:
    Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio; Klubička, Filip; Toral, Antonio;
    Persistent Identifiers
    Publisher: Jožef Stefan Institute
    Project: EC | ABU-MATRAN (324414)

    The fienWaC corpus version 1.0 consists of parallel Finnish-English texts crawled from the .fi top-level domain for Finland. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawli...

    Add to ORCID
44 research outcomes, page 3 of 5