Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.

  • Digital Humanities and Cultural Heritage
  • Research data
  • Dataset
  • Illinois Data Bank

Date (most recent)
arrow_drop_down
  • Authors: Jiang, Ming; Dubnicek, Ryan; Worthey, Glen; Underwood, Ted; +1 Authors

    This is a sentence-level parallel corpus in support of research on OCR quality. The source data comes from: (1) Project Gutenberg for human-proofread "clean" sentences; and, (2) HathiTrust Digital Library for the paired sentences with OCR errors. In total, this corpus contains 167,079 sentence pairs from 189 sampled books in four domains (i.e., agriculture, fiction, social science, world war history) published from 1793 to 1984. There are 36,337 sentences that have two OCR views paired with each clean version. In addition to sentence texts, this corpus also provides the location (i.e., sentence and chapter index) of each sentence in its belonging Gutenberg volume.

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2022
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2022
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Mishra, Sudhanshu; Prasad, Shivangi; Mishra, Shubhanshu;

    Models and predictions for submission to TRAC - 2020 Second Workshop on Trolling, Aggression and Cyberbullying Our approach is described in our paper titled: Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. “Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020.” In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020). The source code for training this model and more details can be found on our code repository: https://github.com/socialmediaie/TRAC2020 NOTE: These models are retrained for uploading here after our submission so the evaluation measures may be slightly different from the ones reported in the paper.

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2020
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2020
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Mishra, Shubhanshu;

    Dataset to be for SocialMediaIE tutorial

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2020
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2020
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Kozuch, Laura;

    Data in this spreadsheet presents basic information on Cahokia, Mound 72 shell artifacts. This includes taxonomic identifications, provenience, and bead measurements. There are five tabs: 1. Raw data; 2. Disk bead measurements; 3. Columella bead measurements; 4. Data on cups and pendants; and, 5. Information on whole shell beads.

    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Rando, Halie; Wadlington, William; Johnson, Jennifer; Stutchman, Jeremy; +3 Authors

    This dataset contains raw data associated with the red fox Y-chromosome assembly (see https://doi.org/10.3390/genes10060409). It includes a fasta file of the 171 scaffolds from the red fox reference genome assembly identified as likely to contain Y-chromosome sequence, the raw BLAST results, and the ABySS assemblies described in the manuscript.

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2019
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2019
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Mishra, Shubhanshu;

    Trained models for multi-task multi-dataset learning for text classification in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Models were trained using: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification.py See https://github.com/socialmediaie/SocialMediaIE and https://socialmediaie.github.io for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2019
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2019
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Mishra, Shubhanshu;

    Trained models for multi-task multi-dataset learning for text classification as well as sequence tagging in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Sequence tagging tasks include POS, NER, Chunking, and SuperSenseTagging. Models were trained using: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification_tagging.py See https://github.com/socialmediaie/SocialMediaIE and https://socialmediaie.github.io for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2019
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2019
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Mishra, Shubhanshu;

    Trained models for multi-task multi-dataset learning for sequence tagging in tweets. Sequence tagging tasks include POS, NER, Chunking, and SuperSenseTagging. Models were trained using: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_experiment.py See https://github.com/socialmediaie/SocialMediaIE and https://socialmediaie.github.io for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2019
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2019
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Kozuch, Laura; Walker, Karen; Marquardt, William;

    Spire angle data for sinistral whelks of the family Busyconidae. Data focuses on spire angles, with some data on total shell length. Locality information is present for all modern specimens.

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2017
    License: CC 0
    Data sources: Datacite
    Illinois Data Bank
    Dataset . 2017
    License: CC 0
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2017
      License: CC 0
      Data sources: Datacite
      Illinois Data Bank
      Dataset . 2017
      License: CC 0
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
Powered by OpenAIRE graph
Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
  • Authors: Jiang, Ming; Dubnicek, Ryan; Worthey, Glen; Underwood, Ted; +1 Authors

    This is a sentence-level parallel corpus in support of research on OCR quality. The source data comes from: (1) Project Gutenberg for human-proofread "clean" sentences; and, (2) HathiTrust Digital Library for the paired sentences with OCR errors. In total, this corpus contains 167,079 sentence pairs from 189 sampled books in four domains (i.e., agriculture, fiction, social science, world war history) published from 1793 to 1984. There are 36,337 sentences that have two OCR views paired with each clean version. In addition to sentence texts, this corpus also provides the location (i.e., sentence and chapter index) of each sentence in its belonging Gutenberg volume.

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2022
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2022
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Mishra, Sudhanshu; Prasad, Shivangi; Mishra, Shubhanshu;

    Models and predictions for submission to TRAC - 2020 Second Workshop on Trolling, Aggression and Cyberbullying Our approach is described in our paper titled: Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. “Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020.” In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020). The source code for training this model and more details can be found on our code repository: https://github.com/socialmediaie/TRAC2020 NOTE: These models are retrained for uploading here after our submission so the evaluation measures may be slightly different from the ones reported in the paper.

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2020
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2020
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Mishra, Shubhanshu;

    Dataset to be for SocialMediaIE tutorial

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2020
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2020
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Kozuch, Laura;

    Data in this spreadsheet presents basic information on Cahokia, Mound 72 shell artifacts. This includes taxonomic identifications, provenience, and bead measurements. There are five tabs: 1. Raw data; 2. Disk bead measurements; 3. Columella bead measurements; 4. Data on cups and pendants; and, 5. Information on whole shell beads.

    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Rando, Halie; Wadlington, William; Johnson, Jennifer; Stutchman, Jeremy; +3 Authors

    This dataset contains raw data associated with the red fox Y-chromosome assembly (see https://doi.org/10.3390/genes10060409). It includes a fasta file of the 171 scaffolds from the red fox reference genome assembly identified as likely to contain Y-chromosome sequence, the raw BLAST results, and the ABySS assemblies described in the manuscript.

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2019
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2019
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Mishra, Shubhanshu;

    Trained models for multi-task multi-dataset learning for text classification in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Models were trained using: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification.py See https://github.com/socialmediaie/SocialMediaIE and https://socialmediaie.github.io for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2019
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2019
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Mishra, Shubhanshu;

    Trained models for multi-task multi-dataset learning for text classification as well as sequence tagging in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Sequence tagging tasks include POS, NER, Chunking, and SuperSenseTagging. Models were trained using: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification_tagging.py See https://github.com/socialmediaie/SocialMediaIE and https://socialmediaie.github.io for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2019
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2019
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Mishra, Shubhanshu;

    Trained models for multi-task multi-dataset learning for sequence tagging in tweets. Sequence tagging tasks include POS, NER, Chunking, and SuperSenseTagging. Models were trained using: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_experiment.py See https://github.com/socialmediaie/SocialMediaIE and https://socialmediaie.github.io for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2019
    License: CC BY
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2019
      License: CC BY
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • Authors: Kozuch, Laura; Walker, Karen; Marquardt, William;

    Spire angle data for sinistral whelks of the family Busyconidae. Data focuses on spire angles, with some data on total shell length. Locality information is present for all modern specimens.

    Illinois Data Bankarrow_drop_down
    Illinois Data Bank
    Dataset . 2017
    License: CC 0
    Data sources: Datacite
    Illinois Data Bank
    Dataset . 2017
    License: CC 0
    Data sources: Datacite
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      Illinois Data Bankarrow_drop_down
      Illinois Data Bank
      Dataset . 2017
      License: CC 0
      Data sources: Datacite
      Illinois Data Bank
      Dataset . 2017
      License: CC 0
      Data sources: Datacite
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
Powered by OpenAIRE graph