Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
91 Research products

  • Digital Humanities and Cultural Heritage
  • Research data
  • Other research products
  • European Commission
  • CLARIN

10
arrow_drop_down
Relevance
arrow_drop_down
  • image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    Authors: Specia, Lucia; Logacheva, Varvara; Blain, Frederic; Fernandez, Ramon; +1 Authors

    Training and development data for the WMT18 QE task. Test data will be published as a separate item. This shared task will build on its previous six editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks make use of datasets produced from post-editions by professional translators. The datasets are domain-specific (IT and life sciences/pharma domains) and extend from those used previous years with more instances and more languages. One important addition is that this year we also include datasets with neural MT outputs. In addition to advancing the state of the art at all prediction levels, our specific goals are: To study the performance of quality estimation approaches on the output of neural MT systems. We will do so by providing datasets for two language language pairs where the same source segments are translated by both a statistical phrase-based and a neural MT system. To study the predictability of deleted words, i.e. words that are missing in the MT output. TO do so, for the first time we provide data annotated for such errors at training time. To study the effectiveness of explicitly assigned labels for phrases. We will do so by providing a dataset where each phrase in the output of a phrase-based statistical MT system was annotated by human translators. To study the effect of different language pairs. We will do so by providing datasets created in similar ways for four language language pairs. To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. Measure progress over years at all prediction levels. We will do so by using last year's test set for comparative experiments. In-house statistical and neural MT systems were built to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes. Participants are allowed to explore any additional data and resources deemed relevant.

    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Galuščáková, Petra; Garabík, Radovan; Bojar, Ondřej;

    Czech-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl [2], Official Journal of the European Union [3] and part of OPUS corpus [4] – EMEA, EUConst, KDE4 and PHP) and downloaded website of European Commission [5]. Corpus is published in both in plaintext format and with an automatic morphological annotation. References: [1] http://langtech.jrc.it/JRC-Acquis.html/ [2] http://www.statmt.org/europarl/ [3] http://apertium.eu/data [4] http://opus.lingfil.uu.se/ [5] http://ec.europa.eu/

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Bojar, Ondřej; Straňák, Pavel; Zeman, Daniel;

    A Hindi corpus of texts downloaded mostly from news sites. Contains both the original raw texts and an extensively cleaned-up and tokenized version suitable for language modeling. 18M sentences, 308M tokens

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    Authors: Turchi, Marco; Negri, Matteo; Chatterjee, Rajen;

    Test data for the WMT 2018 Automatic post-editing task. They consist in English-German pairs (source and target) belonging to the information technology domain and already tokenized. Test set contains 2,000 pairs. A phrase-based machine translation system has been used to generate the target segments. This test set is sampled from the same dataset used for the 2016 and 2017 APE shared task editions. All data is provided by the EU project QT21 (http://www.qt21.eu/).

    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Çano, Erion;

    OAGK is a keyword extraction/generation dataset consisting of 2.2 million abstracts, titles and keyword strings from cientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file. This data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY licence. This data (OAGK Keyword Generation Dataset) is released under CC-BY licence (https://creativecommons.org/licenses/by/4.0/). If using it, please cite the following paper: Çano, Erion and Bojar, Ondřej, 2019, Keyphrase Generation: A Text Summarization Struggle, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, June 2019, Minneapolis, USA

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Hoang, Duc Tam; Bojar, Ondřej;

    CsEnVi Pairwise Parallel Corpora consist of Vietnamese-Czech parallel corpus and Vietnamese-English parallel corpus. The corpora were assembled from the following sources: - OPUS, the open parallel corpus is a growing multilingual corpus of translated open source documents. The majority of Vi-En and Vi-Cs bitexts are subtitles from movies and television series. The nature of the bitexts are paraphrasing of each other's meaning, rather than translations. - TED talks, a collection of short talks on various topics, given primarily in English, transcribed and with transcripts translated to other languages. In our corpus, we use 1198 talks which had English and Vietnamese transcripts available and 784 talks which had Czech and Vietnamese transcripts available in January 2015. The size of the original corpora collected from OPUS and TED talks is as follows: CS/VI EN/VI Sentence 1337199/1337199 2035624/2035624 Word 9128897/12073975 16638364/17565580 Unique word 224416/68237 91905/78333 We improve the quality of the corpora in two steps: normalizing and filtering. In the normalizing step, the corpora are cleaned based on the general format of subtitles and transcripts. For instance, sequences of dots indicate explicit continuation of subtitles across multiple time frames. The sequences of dots are distributed differently in the source and the target side. Removing the sequence of dots, along with a number of other normalization rules, improves the quality of the alignment significantly. In the filtering step, we adapt the CzEng filtering tool [1] to filter out bad sentence pairs. The size of cleaned corpora as published is as follows: CS/VI EN/VI Sentence 1091058/1091058 1113177/1091058 Word 6718184/7646701 8518711/8140876 Unique word 195446/59737 69513/58286 The corpora are used as training data in [2]. References: [1] Ondřej Bojar, Zdeněk Žabokrtský, et al. 2012. The Joy of Parallelism with CzEng 1.0. Proceedings of LREC2012. ELRA. Istanbul, Turkey. [2] Duc Tam Hoang and Ondřej Bojar, The Prague Bulletin of Mathematical Linguistics. Volume 104, Issue 1, Pages 75–86, ISSN 1804-0462. 9/2015

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    Authors: Turchi, Marco; Chatterjee, Rajen; Negri, Matteo;

    Human post-edited test sentences for the WMT 2017 Automatic post-editing task. This consists in 2,000 German sentences belonging to the IT domain and already tokenized. Source and target segments can be downloaded from: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2133. All data is provided by the EU project QT21 (http://www.qt21.eu/).

    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Nedoluzhko, Anna; Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; +1 Authors

    CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 0.1 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 13 datasets for 10 languages (1 dataset for Catalan, 2 for Czech, 2 for English, 1 for French, 2 for German, 1 for Hungarian, 1 for Lithuanian, 1 for Polish, 1 for Russian, and 1 for Spanish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. References to original resources whose harmonized versions are contained in the public edition of CorefUD 0.1: - Catalan-AnCora: Recasens, M. and Martí, M. A. (2010). AnCora-CO: Coreferentially Annotated Corpora for Spanish and Catalan. Language Resources and Evaluation, 44(4):315–345 - Czech-PCEDT: Nedoluzhko, A., Novák, M., Cinková, S., Mikulová, M., and Mírovský, J. (2016). Coreference in Prague Czech-English Dependency Treebank. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 169–176, Portorož, Slovenia. European Language Resources Association. - Czech-PDT: Hajič, J., Bejček, E., Hlaváčová, J., Mikulová, M., Straka, M., Štěpánek, J., and Štěpánková, B. (2020). Prague Dependency Treebank - Consolidated 1.0. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pages 5208–5218, Marseille, France. European Language Resources Association. - English-GUM: Zeldes, A. (2017). The GUM Corpus: Creating Multilayer Resources in the Classroom. Language Resources and Evaluation, 51(3):581–612. - English-ParCorFull: Lapshinova-Koltunski, E., Hardmeier, C., and Krielke, P. (2018). ParCorFull: a Parallel Corpus Annotated with Full Coreference. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association. - French-Democrat: Landragin, F. (2016). Description, modélisation et détection automatique des chaı̂nes de référence (DEMOCRAT). Bulletin de l’Association Française pour l’Intelligence Artificielle, (92):11–15. - German-ParCorFull: Lapshinova-Koltunski, E., Hardmeier, C., and Krielke, P. (2018). ParCorFull: a Parallel Corpus Annotated with Full Coreference. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association - German-PotsdamCC: Bourgonje, P. and Stede, M. (2020). The Potsdam Commentary Corpus 2.2: Extending annotations for shallow discourse parsing. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 1061–1066, Marseille, France. European Language Resources Association. - Hungarian-SzegedKoref: Vincze, V., Hegedűs, K., Sliz-Nagy, A., and Farkas, R. (2018). SzegedKoref: A Hungarian Coreference Corpus. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association. - Lithuanian-LCC: Žitkus, V. and Butkienė, R. (2018). Coreference Annotation Scheme and Corpus for Lithuanian Language. In Fifth International Conference on Social Networks Analysis, Management and Security, SNAMS 2018, Valencia, Spain, October 15-18, 2018, pages 243–250. IEEE. - Polish-PCC: Ogrodniczuk, M., Glowińska, K., Kopeć, M., Savary, A., and Zawisławska, M. (2013). Polish coreference corpus. In Human Language Technology. Challenges for Computer Science and Linguistics - 6th Language and Technology Conference, LTC 2013, Poznań, Poland, December 7-9, 2013. Revised Selected Papers, volume 9561 of Lecture Notes in Computer Science, pages 215–226. Springer. - Russian-RuCor: Toldova, S., Roytberg, A., Ladygina, A. A., Vasilyeva, M. D., Azerkovich, I. L., Kurzukov,M., Sim, G., Gorshkov, D. V., Ivanova, A., Nedoluzhko, A., and Grishina, Y. (2014). Evaluating Anaphora and Coreference Resolution for Russian. In Komp’juternaja lingvistika i intellektual’nye tehnologii. Po materialam ezhegodnoj Mezhdunarodnoj konferencii Dialog, pages 681–695. - Spanish-AnCora: Recasens, M. and Martí, M. A. (2010). AnCora-CO: Coreferentially Annotated Corpora for Spanish and Catalan. Language Resources and Evaluation, 44(4):315–345 References to original resources whose harmonized versions are contained in the ÚFAL-internal edition of CorefUD 0.1: - Dutch-COREA: Hendrickx, I., Bouma, G., Coppens, F., Daelemans, W., Hoste, V., Kloosterman, G., Mineur, A.-M., Van Der Vloet, J., and Verschelde, J.-L. (2008). A coreference corpus and resolution system for Dutch. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco. European Language Resources Association. - English-ARRAU: Uryupina, O., Artstein, R., Bristot, A., Cavicchio, F., Delogu, F., Rodriguez, K. J., and Poesio, M. (2020). Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus. Natural Language Engineering, 26(1):95–128. - English-OntoNotes: Weischedel, R., Hovy, E., Marcus, M., Palmer, M., Belvin, R., Pradhan, S., Ramshaw, L., and Xue, N. (2011). Ontonotes: A large training corpus for enhanced processing. In Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation, pages 54–63, New York. Springer-Verlag. - English-PCEDT: Nedoluzhko, A., Novák, M., Cinková, S., Mikulová, M., and Mírovský, J. (2016). Coreference in Prague Czech-English Dependency Treebank. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 169–176, Portorož, Slovenia. European Language Resources Association.

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Mírovský, Jiří; Mikulová, Marie; Nedoluzhko, Anna; Novák, Michal; +1 Authors

    We present an extended version of the Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0). It includes all annotation of coreference (the original one from PCEDT 2.0 as well as the new one) and improved cross-lingual alignment of coreferential expressions. The corpus released as PCEDT 2.0 Coref is publicly available.

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Biblio at Institute ...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    LINDAT/CLARIAH-CZ repository
    Other ORP type . 2016
    Data sources: B2FIND
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Biblio at Institute ...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      LINDAT/CLARIAH-CZ repository
      Other ORP type . 2016
      Data sources: B2FIND
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    Authors: Turchi, Marco; Chatterjee, Rajen; Negri, Marco;

    Test data for the WMT 2017 Automatic post-editing task (the same used for the Sentence-level Quality Estimation task). They consist in German-English triplets (source and target) belonging to the pharmacological domain and already tokenized. Test set contains 2,000 pairs. All data is provided by the EU project QT21 (http://www.qt21.eu/).

    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
91 Research products
  • image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    Authors: Specia, Lucia; Logacheva, Varvara; Blain, Frederic; Fernandez, Ramon; +1 Authors

    Training and development data for the WMT18 QE task. Test data will be published as a separate item. This shared task will build on its previous six editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks make use of datasets produced from post-editions by professional translators. The datasets are domain-specific (IT and life sciences/pharma domains) and extend from those used previous years with more instances and more languages. One important addition is that this year we also include datasets with neural MT outputs. In addition to advancing the state of the art at all prediction levels, our specific goals are: To study the performance of quality estimation approaches on the output of neural MT systems. We will do so by providing datasets for two language language pairs where the same source segments are translated by both a statistical phrase-based and a neural MT system. To study the predictability of deleted words, i.e. words that are missing in the MT output. TO do so, for the first time we provide data annotated for such errors at training time. To study the effectiveness of explicitly assigned labels for phrases. We will do so by providing a dataset where each phrase in the output of a phrase-based statistical MT system was annotated by human translators. To study the effect of different language pairs. We will do so by providing datasets created in similar ways for four language language pairs. To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. Measure progress over years at all prediction levels. We will do so by using last year's test set for comparative experiments. In-house statistical and neural MT systems were built to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes. Participants are allowed to explore any additional data and resources deemed relevant.

    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Galuščáková, Petra; Garabík, Radovan; Bojar, Ondřej;

    Czech-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl [2], Official Journal of the European Union [3] and part of OPUS corpus [4] – EMEA, EUConst, KDE4 and PHP) and downloaded website of European Commission [5]. Corpus is published in both in plaintext format and with an automatic morphological annotation. References: [1] http://langtech.jrc.it/JRC-Acquis.html/ [2] http://www.statmt.org/europarl/ [3] http://apertium.eu/data [4] http://opus.lingfil.uu.se/ [5] http://ec.europa.eu/

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Bojar, Ondřej; Straňák, Pavel; Zeman, Daniel;

    A Hindi corpus of texts downloaded mostly from news sites. Contains both the original raw texts and an extensively cleaned-up and tokenized version suitable for language modeling. 18M sentences, 308M tokens

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    Authors: Turchi, Marco; Negri, Matteo; Chatterjee, Rajen;

    Test data for the WMT 2018 Automatic post-editing task. They consist in English-German pairs (source and target) belonging to the information technology domain and already tokenized. Test set contains 2,000 pairs. A phrase-based machine translation system has been used to generate the target segments. This test set is sampled from the same dataset used for the 2016 and 2017 APE shared task editions. All data is provided by the EU project QT21 (http://www.qt21.eu/).

    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Çano, Erion;

    OAGK is a keyword extraction/generation dataset consisting of 2.2 million abstracts, titles and keyword strings from cientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file. This data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY licence. This data (OAGK Keyword Generation Dataset) is released under CC-BY licence (https://creativecommons.org/licenses/by/4.0/). If using it, please cite the following paper: Çano, Erion and Bojar, Ondřej, 2019, Keyphrase Generation: A Text Summarization Struggle, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, June 2019, Minneapolis, USA

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ LINDAT/CLARIN reposi...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/