- home
- Advanced Search
Advanced search in Research outcomes
Filters
Clear AllDigital Humanities and Cultural Heritage Research data EU CLARIN.SI repository
Filters
Clear AllDigital Humanities and Cultural Heritage Research data EU CLARIN.SI repository
Loading
- research data . 2022 . Embargo End Date: 29 Jul 2022Open AccessAuthors:Martelli, Federico; Navigli, Roberto; Krek, Simon; Kallas, Jelena; Gantar, Polona; Koeva, Svetla; Nimb, Sanni; Sandford Pedersen, Bolette; Olsen, Sussi; Langemets, Margit; ...Persistent Identifiers
handle: 11356/1674
Publisher: Jožef Stefan InstituteProject: EC | ELEXIS (731015)ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.0 contains sentences for 10 languages: Bulgarian, Danish, English, Spanish, Estonian, Hungarian, Italian, Dutch, Portugues...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2022 . Embargo End Date: 30 Jun 2022Open AccessAuthors:Koloski, Boshko; Martinc, Matej; Tavchioski, Ilija; Škrlj, Blaž; Pollak, Senja;Persistent Identifiers
handle: 11356/1495
Publisher: Jožef Stefan InstituteProject: EC | EMBEDDIA (825153)The dataset consists of 7514 Slovenian news articles from the SentiNews 1.0 corpus by Bučar et al. 2017 (http://hdl.handle.net/11356/1110) which had available article keywords. We provide the train and test data splits (5995 articles for training and 1519 for testing) t...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2022 . Embargo End Date: 24 Feb 2022Open AccessAuthors:Freienthal, Linda; Pelicon, Andraž; Martinc, Matej; Škrlj, Blaž; Krustok, Ivar; Pranjić, Marko; Cabrera-Diego, Luis Adrián; Purver, Matthew; Pollak, Senja; Kuulmets, Hele-Andra; ...Persistent Identifiers
handle: 11356/1485
Publisher: Ekspress Meedia GroupProject: EC | EMBEDDIA (825153)This dataset contains articles from EMBEDDIA Media partners with various information added by the tools developed within the EMBEDDIA project: - 12,390 Estonian articles from 2019 with tags given by Ekspress Meedia. The complete dataset without the output of EMBEDDIA to...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2022 . Embargo End Date: 10 Feb 2022RestrictedAuthors:Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko; Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Ferme, Marko; Borovič, Mladen; Boškovič, Borko; Ojsteršek, Milan; ...Persistent Identifiers
handle: 11356/1447
Publisher: Faculty of Electrical Engineering and Computer Science, University of MariborProject: EC | EMBEDDIA (825153)The Machine Translation datasets KAS-MT 1.0 contain automatically sentence-aligned Slovene and English plain-text abstracts from KAS-Abs 2.0 (http://hdl.handle.net/11356/1449) and is meant for studies in machine translation. The setence alignment approach used requires ...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2022 . Embargo End Date: 10 Feb 2022RestrictedAuthors:Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko; Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Ferme, Marko; Borovič, Mladen; Boškovič, Borko; Ojsteršek, Milan; ...Persistent Identifiers
handle: 11356/1446
Publisher: Faculty of Electrical Engineering and Computer Science, University of MariborProject: EC | EMBEDDIA (825153)Summarization datasets were created from the text bodies in the KAS 2.0 corpus (http://hdl.handle.net/11356/1448) and the abstracts from the KAS-Abs 2.0 corpus (http://hdl.handle.net/11356/1449). The monolingual slo2slo dataset contains 69,730 Slovene abstracts and Slov...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2022 . Embargo End Date: 10 Feb 2022RestrictedAuthors:Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko; Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Ferme, Marko; Borovič, Mladen; Boškovič, Borko; Ojsteršek, Milan; ...Persistent Identifiers
handle: 11356/1449
Publisher: Faculty of Electrical Engineering and Computer Science, University of MariborProject: EC | EMBEDDIA (825153)The KAS-abs 2.0 corpus contains 125,202 automatically identified Slovenian and/or English abstracts from BSc/BA, MSc/MA, and PhD theses included in the KAS Corpus of Academic Slovene 2.0 (http://hdl.handle.net/11356/1448). The abstracts are either in Slovenian (*-abs-sl...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2022 . Embargo End Date: 04 Feb 2022RestrictedAuthors:Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko; Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Ferme, Marko; Borovič, Mladen; Boškovič, Borko; Ojsteršek, Milan; ...Persistent Identifiers
handle: 11356/1448
Publisher: Faculty of Electrical Engineering and Computer Science, University of MariborProject: EC | EMBEDDIA (825153)The KAS corpus of Slovene academic writing consists of almost 65,000 BSc/BA, 16,000 MSc/MA and 1,600 PhD theses (82 thousand texts, 5 million pages or 1,5 billion tokens) written 2000 - 2018 and gathered from the digital libraries of Slovene higher education institution...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2021 . Embargo End Date: 04 Jun 2021Open AccessAuthors:Koloski, Boshko; Pollak, Senja; Škrlj, Blaž; Martinc, Matej;Persistent Identifiers
handle: 11356/1403
Publisher: Ekspress Meedia GroupProject: EC | EMBEDDIA (825153)EACL Hackashop Keyword Challenge Datasets In this repository you can find ids of articles used for the keyword extraction challenge at EACL Hackashop on News Media Content Analysis and Automated Report Generation (http://embeddia.eu/hackashop2021/). The article ids can ...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2021 . Embargo End Date: 24 May 2021Open AccessAuthors:Shekhar, Ravi; Purver, Matthew; Pollak, Senja; Pelicon, Andraž; Krustok, Ivar;Persistent Identifiers
handle: 11356/1407
Publisher: Ekspress Meedia GroupProject: EC | EMBEDDIA (825153)The dataset is an archive of reader comments from the Delfi news site from 2014-2019, containing approximately 12M comments, mostly in the Latvian language, with some in Russian. Description of the Datasets There are 6 CSV files: * ``lv-comments-2014.csv`` contains **2 ...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome. - research data . 2021 . Embargo End Date: 24 May 2021Open AccessAuthors:Shekhar, Ravi; Pollak, Senja; Pelicon, Andraž; Matthew, Purver; Krustok, Ivar;Persistent Identifiers
handle: 11356/1401
Publisher: Ekspress Meedia GroupProject: EC | EMBEDDIA (825153)This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian. Description of the Datasets. There are 11 CSV files: comments_2009.csv contains 2...
Add to ORCID Please grant OpenAIRE to access and update your ORCID works.This research outcome is the result of merged research outcomes in OpenAIRE.
You have already added works in your ORCID record related to the merged research outcome.