- home
- Advanced Search
7 Research products, page 1 of 1
Loading
- Research data . 2022 . Embargo End Date: 20 Jun 2022Authors:Jiang, Ming; Dubnicek, Ryan; Worthey, Glen; Underwood, Ted; Downie, J. Stephen;Jiang, Ming; Dubnicek, Ryan; Worthey, Glen; Underwood, Ted; Downie, J. Stephen;Publisher: University of Illinois at Urbana-Champaign
This is a sentence-level parallel corpus in support of research on OCR quality. The source data comes from: (1) Project Gutenberg for human-proofread "clean" sentences; and, (2) HathiTrust Digital Library for the paired sentences with OCR errors. In total, this corpus contains 167,079 sentence pairs from 189 sampled books in four domains (i.e., agriculture, fiction, social science, world war history) published from 1793 to 1984. There are 36,337 sentences that have two OCR views paired with each clean version. In addition to sentence texts, this corpus also provides the location (i.e., sentence and chapter index) of each sentence in its belonging Gutenberg volume.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2020 . Embargo End Date: 17 May 2020Authors:Mishra, Sudhanshu; Prasad, Shivangi; Mishra, Shubhanshu;Mishra, Sudhanshu; Prasad, Shivangi; Mishra, Shubhanshu;Publisher: University of Illinois at Urbana-Champaign
Models and predictions for submission to TRAC - 2020 Second Workshop on Trolling, Aggression and Cyberbullying Our approach is described in our paper titled: Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. “Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020.” In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020). The source code for training this model and more details can be found on our code repository: https://github.com/socialmediaie/TRAC2020 NOTE: These models are retrained for uploading here after our submission so the evaluation measures may be slightly different from the ones reported in the paper.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2020 . Embargo End Date: 13 Oct 2020Authors:Kozuch, Laura;Kozuch, Laura;Publisher: University of Illinois at Urbana-Champaign
Data in this spreadsheet presents basic information on Cahokia, Mound 72 shell artifacts. This includes taxonomic identifications, provenience, and bead measurements. There are five tabs: 1. Raw data; 2. Disk bead measurements; 3. Columella bead measurements; 4. Data on cups and pendants; and, 5. Information on whole shell beads.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2020 . Embargo End Date: 16 Jul 2020Authors:Mishra, Shubhanshu;Mishra, Shubhanshu;Publisher: University of Illinois at Urbana-Champaign
Dataset to be for SocialMediaIE tutorial
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2019 . Embargo End Date: 17 Sep 2019Authors:Mishra, Shubhanshu;Mishra, Shubhanshu;Publisher: University of Illinois at Urbana-Champaign
Trained models for multi-task multi-dataset learning for text classification as well as sequence tagging in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Sequence tagging tasks include POS, NER, Chunking, and SuperSenseTagging. Models were trained using: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification_tagging.py See https://github.com/socialmediaie/SocialMediaIE and https://socialmediaie.github.io for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2019 . Embargo End Date: 17 Sep 2019Authors:Mishra, Shubhanshu;Mishra, Shubhanshu;Publisher: University of Illinois at Urbana-Champaign
Trained models for multi-task multi-dataset learning for text classification in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Models were trained using: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification.py See https://github.com/socialmediaie/SocialMediaIE and https://socialmediaie.github.io for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2017 . Embargo End Date: 06 Sep 2017Authors:Kozuch, Laura; Walker, Karen; Marquardt, William;Kozuch, Laura; Walker, Karen; Marquardt, William;Publisher: University of Illinois at Urbana-Champaign
Spire angle data for sinistral whelks of the family Busyconidae. Data focuses on spire angles, with some data on total shell length. Locality information is present for all modern specimens.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.
7 Research products, page 1 of 1
Loading
- Research data . 2022 . Embargo End Date: 20 Jun 2022Authors:Jiang, Ming; Dubnicek, Ryan; Worthey, Glen; Underwood, Ted; Downie, J. Stephen;Jiang, Ming; Dubnicek, Ryan; Worthey, Glen; Underwood, Ted; Downie, J. Stephen;Publisher: University of Illinois at Urbana-Champaign
This is a sentence-level parallel corpus in support of research on OCR quality. The source data comes from: (1) Project Gutenberg for human-proofread "clean" sentences; and, (2) HathiTrust Digital Library for the paired sentences with OCR errors. In total, this corpus contains 167,079 sentence pairs from 189 sampled books in four domains (i.e., agriculture, fiction, social science, world war history) published from 1793 to 1984. There are 36,337 sentences that have two OCR views paired with each clean version. In addition to sentence texts, this corpus also provides the location (i.e., sentence and chapter index) of each sentence in its belonging Gutenberg volume.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2020 . Embargo End Date: 17 May 2020Authors:Mishra, Sudhanshu; Prasad, Shivangi; Mishra, Shubhanshu;Mishra, Sudhanshu; Prasad, Shivangi; Mishra, Shubhanshu;Publisher: University of Illinois at Urbana-Champaign
Models and predictions for submission to TRAC - 2020 Second Workshop on Trolling, Aggression and Cyberbullying Our approach is described in our paper titled: Mishra, Sudhanshu, Shivangi Prasad, and Shubhanshu Mishra. 2020. “Multilingual Joint Fine-Tuning of Transformer Models for Identifying Trolling, Aggression and Cyberbullying at TRAC 2020.” In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (TRAC-2020). The source code for training this model and more details can be found on our code repository: https://github.com/socialmediaie/TRAC2020 NOTE: These models are retrained for uploading here after our submission so the evaluation measures may be slightly different from the ones reported in the paper.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2020 . Embargo End Date: 13 Oct 2020Authors:Kozuch, Laura;Kozuch, Laura;Publisher: University of Illinois at Urbana-Champaign
Data in this spreadsheet presents basic information on Cahokia, Mound 72 shell artifacts. This includes taxonomic identifications, provenience, and bead measurements. There are five tabs: 1. Raw data; 2. Disk bead measurements; 3. Columella bead measurements; 4. Data on cups and pendants; and, 5. Information on whole shell beads.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2020 . Embargo End Date: 16 Jul 2020Authors:Mishra, Shubhanshu;Mishra, Shubhanshu;Publisher: University of Illinois at Urbana-Champaign
Dataset to be for SocialMediaIE tutorial
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2019 . Embargo End Date: 17 Sep 2019Authors:Mishra, Shubhanshu;Mishra, Shubhanshu;Publisher: University of Illinois at Urbana-Champaign
Trained models for multi-task multi-dataset learning for text classification as well as sequence tagging in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Sequence tagging tasks include POS, NER, Chunking, and SuperSenseTagging. Models were trained using: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification_tagging.py See https://github.com/socialmediaie/SocialMediaIE and https://socialmediaie.github.io for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2019 . Embargo End Date: 17 Sep 2019Authors:Mishra, Shubhanshu;Mishra, Shubhanshu;Publisher: University of Illinois at Urbana-Champaign
Trained models for multi-task multi-dataset learning for text classification in tweets. Classification tasks include sentiment prediction, abusive content, sarcasm, and veridictality. Models were trained using: https://github.com/socialmediaie/SocialMediaIE/blob/master/SocialMediaIE/scripts/multitask_multidataset_classification.py See https://github.com/socialmediaie/SocialMediaIE and https://socialmediaie.github.io for details. If you are using this data, please also cite the related article: Shubhanshu Mishra. 2019. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT '19). ACM, New York, NY, USA, 283-284. DOI: https://doi.org/10.1145/3342220.3344929
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2017 . Embargo End Date: 06 Sep 2017Authors:Kozuch, Laura; Walker, Karen; Marquardt, William;Kozuch, Laura; Walker, Karen; Marquardt, William;Publisher: University of Illinois at Urbana-Champaign
Spire angle data for sinistral whelks of the family Busyconidae. Data focuses on spire angles, with some data on total shell length. Locality information is present for all modern specimens.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.