Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
25 Research products, page 1 of 3

  • Digital Humanities and Cultural Heritage
  • Research data
  • IEEE DataPort

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Research data . 2021
    Authors: 
    Mulwafu, Watipaso; Xiao, Guangyi;
    Publisher: IEEE DataPort

    This is a dataset containing 1,661 movie scripts. Movies scripts extracted thanks to the RiTUAL Lab. It is a subset and variation of this dataset. On our part, we added age certificates and severity levels to it. These severity levels cover profanity, violence and sex content.

  • Authors: 
    Melendez Barros, Jose; De Bona, Glauber;
    Publisher: IEEE DataPort

    Aspect Sentiment Triplet Extraction (ASTE) is an Aspect-Based Sentiment Analysis subtask (ABSA). It aims to extract aspect-opinion pairs from a sentence and identify the sentiment polarity associated with them. For instance, given the sentence ``Large rooms and great breakfast", ASTE outputs the triplet T = {(rooms, large, positive), (breakfast, great, positive)}. Although several approaches to ASBA have recently been proposed, those for Portuguese have been mostly limited to extracting only aspects without addressing ASTE tasks. This work aims to develop a framework based on Deep Learning to perform the Aspect Sentiment Triplet Extraction task in Portuguese. The framework uses BERT as a context-awareness sentence encoder, multiple parallel non-linear layers to get aspect and opinion representations, and a Graph Attention layer along with a Biaffine scorer to determine the sentiment dependency between each aspect-opinion pair. The comparison results show that our proposed framework significantly outperforms the baselines in Portuguese and is competitive with its counterparts in English.

  • Authors: 
    Ilyevsky, Thomas Victor; Johansen, Jared Sigurd; Siskind, Jeffrey Mark;
    Publisher: IEEE DataPort

    Dataset asscociated with a paper in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems"Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments"If you use this code or data, please cite the above paper.

  • Authors: 
    Aberkane, Abdel-Jaouad;
    Publisher: IEEE DataPort

    The General Data Protection Regulation (GDPR), adopted in 2018, profoundly impacts information processing organizations as they must comply with this regulation. In this research, we consider GDPR-compliance as a high-level goal in software development that should be addressed at the offset of software development, meaning during requirements engineering (RE). In this work, we hypothesize that Natural Language Processing (NLP) can offer a viable means to automate this process. We conducted a systematic mapping study to explore the existing literature on the intersection of GDPR, RE, and NLP. As a result, we identified 448 relevant studies, of which the majority (420) were related to NLP and RE. Research on the intersection of GDPR and NLP yielded nine studies, while 20 studies were related to GDPR and RE. Even though only one study was identified on the convergence of GDPR, NLP, and RE, the mapping results indicate opportunities for bridging the gap between these fields. In particular, we identified possibilities for introducing NLP techniques to automate manual RE tasks in the crossing of GDPR and RE, in addition to possibilities of using NLP-based machine learning techniques to achieve GDPR-compliance in RE.

  • Research data . 2021
    Authors: 
    Loose, Davis;
    Publisher: IEEE DataPort

    A selection of theCOVID-19 Open Research Dataset used for exploring the efficacy of the LDaRM text analytics technique.

  • Open Access

    Right now we see that depression is one of the most common problems in our society. Most of the time people are committed suicide only cause of depression. And till now there is no proper lab test way for detecting depression. Generally, doctors are detecting depression by asking some knowledge-base questions. On the other hand, there are a good number of people using social media platforms right now, where they are sharing their daily experiences, emotion, and other activity with their friends. Twitter is one of the common social platforms and also popular for data collection. I was collecting these datasets from twitter based on some depressive words. I hope that this twitter datasets will help researchers to detect depression more precisely.

  • Authors: 
    Chen, Bernard;
    Publisher: IEEE DataPort

    Wine has been popular with the public for centuries; in the market, there are a variety of wines to choose from. Among all, Bordeaux, France, is considered as the most famous wine region in the world. In this paper, we try to understand Bordeaux wines made in the 21st century through Wineinformatics study. We developed and studied two datasets: the first dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine Official Classification, from 2000 to 2016. A total of 14,349 wine reviews are collected in the first dataset, and 1359 wine reviews in the second dataset. In order to understand the relation between wine quality and characteristics, Naïve Bayes classifier is applied to predict the qualities (90+/89−) of wines. Support Vector Machine (SVM) classifier is also applied as a comparison. In the first dataset, SVM classifier achieves the best accuracy of 86.97%; in the second dataset, Naïve Bayes classifier achieves the best accuracy of 84.62%. Precision, recall, and f-score are also used as our measures to describe the performance of our models. Meaningful features associate with high quality 21 century Bordeaux wines are able to be presented through this research paper.

  • Authors: 
    Chen, Bernard;
    Publisher: IEEE DataPort

    Wine has been popular with the public for centuries; in the market, there are a variety of wines to choose from. Among all, Bordeaux, France, is considered as the most famous wine region in the world. In this paper, we try to understand Bordeaux wines made in the 21st century through Wineinformatics study. We developed and studied two datasets: the first dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine Official Classification, from 2000 to 2016. A total of 14,349 wine reviews are collected in the first dataset, and 1359 wine reviews in the second dataset. In order to understand the relation between wine quality and characteristics, Naïve Bayes classifier is applied to predict the qualities (90+/89−) of wines. Support Vector Machine (SVM) classifier is also applied as a comparison. In the first dataset, SVM classifier achieves the best accuracy of 86.97%; in the second dataset, Naïve Bayes classifier achieves the best accuracy of 84.62%. Precision, recall, and f-score are also used as our measures to describe the performance of our models. Meaningful features associate with high quality 21 century Bordeaux wines are able to be presented through this research paper.

  • This dataset is a collection of images and their respective labels containing examples of multiple Brazilian coins, the primary purpose is to support the development of Computer Vision techniques for automatic detection of such objects, i.e., localization and classification tasks. It contains coins of R$ 0.05, 0.10, 0.25, 0.50 and 1.00 in Brazilian currency from the 2nd family, as manufactured by Casa da Moeda (http://www.casadamoeda.gov.br) since 2010. The samples were collected with a mobile phone and contain multiple coins placed upon a flat white A4 sheet of paper. Labels were obtained from a group with several individuals from both sexes and detailed reviewed. Each label has a circular or polygon shape and denotes the corresponding value in cents of the coin it is related to. This dataset is an improvement from Brazilian Coins (available at https://www.kaggle.com/lgmoneda/br-coins) where location labels were created.

  • Authors: 
    Hyun, Young Geun; Ko, Jindeuk; Han, Jeong Hyeon;
    Publisher: IEEE DataPort

    The age of Artificial Intelligence (AI) is coming. Since Natural Language Processing (NLP) is a core AI technology for communication between humans and devices, it is vital to understand technological trends. Early research on NLP focused on syntactic processing such as information extraction and subject modeling but later developed into the semantic-oriented analysis. To analyze technological trends concerning NLP, especially semantic analysis, patent data that contains objective and extensive information is analyzed. The analysis procedures follow text mining to collect patent information, pre-processing, and analysis in keyword frequency, keyword network, and time series. The results reveal that there is a difference in the direction of technological development as the core keywords are at different frequencies and centrality among countries. Besides, from the time series analysis for five intervals over 20 years, twelve keywords of the rising / falling trend are observed in the US, seven in the EU, and five in Korea. The greater number of keywords infer that the US underwent further technological progress as compared to other countries. Moreover, the technical linkage of the US-EU is presumed to be sturdier than the US-Korea based on the keyword similarity over time. The analysis results of this study can be used as valuable references for future technical predictions related to NLP.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
25 Research products, page 1 of 3
  • Research data . 2021
    Authors: 
    Mulwafu, Watipaso; Xiao, Guangyi;
    Publisher: IEEE DataPort

    This is a dataset containing 1,661 movie scripts. Movies scripts extracted thanks to the RiTUAL Lab. It is a subset and variation of this dataset. On our part, we added age certificates and severity levels to it. These severity levels cover profanity, violence and sex content.

  • Authors: 
    Melendez Barros, Jose; De Bona, Glauber;
    Publisher: IEEE DataPort

    Aspect Sentiment Triplet Extraction (ASTE) is an Aspect-Based Sentiment Analysis subtask (ABSA). It aims to extract aspect-opinion pairs from a sentence and identify the sentiment polarity associated with them. For instance, given the sentence ``Large rooms and great breakfast", ASTE outputs the triplet T = {(rooms, large, positive), (breakfast, great, positive)}. Although several approaches to ASBA have recently been proposed, those for Portuguese have been mostly limited to extracting only aspects without addressing ASTE tasks. This work aims to develop a framework based on Deep Learning to perform the Aspect Sentiment Triplet Extraction task in Portuguese. The framework uses BERT as a context-awareness sentence encoder, multiple parallel non-linear layers to get aspect and opinion representations, and a Graph Attention layer along with a Biaffine scorer to determine the sentiment dependency between each aspect-opinion pair. The comparison results show that our proposed framework significantly outperforms the baselines in Portuguese and is competitive with its counterparts in English.

  • Authors: 
    Ilyevsky, Thomas Victor; Johansen, Jared Sigurd; Siskind, Jeffrey Mark;
    Publisher: IEEE DataPort

    Dataset asscociated with a paper in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems"Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments"If you use this code or data, please cite the above paper.

  • Authors: 
    Aberkane, Abdel-Jaouad;
    Publisher: IEEE DataPort

    The General Data Protection Regulation (GDPR), adopted in 2018, profoundly impacts information processing organizations as they must comply with this regulation. In this research, we consider GDPR-compliance as a high-level goal in software development that should be addressed at the offset of software development, meaning during requirements engineering (RE). In this work, we hypothesize that Natural Language Processing (NLP) can offer a viable means to automate this process. We conducted a systematic mapping study to explore the existing literature on the intersection of GDPR, RE, and NLP. As a result, we identified 448 relevant studies, of which the majority (420) were related to NLP and RE. Research on the intersection of GDPR and NLP yielded nine studies, while 20 studies were related to GDPR and RE. Even though only one study was identified on the convergence of GDPR, NLP, and RE, the mapping results indicate opportunities for bridging the gap between these fields. In particular, we identified possibilities for introducing NLP techniques to automate manual RE tasks in the crossing of GDPR and RE, in addition to possibilities of using NLP-based machine learning techniques to achieve GDPR-compliance in RE.

  • Research data . 2021
    Authors: 
    Loose, Davis;
    Publisher: IEEE DataPort

    A selection of theCOVID-19 Open Research Dataset used for exploring the efficacy of the LDaRM text analytics technique.

  • Open Access

    Right now we see that depression is one of the most common problems in our society. Most of the time people are committed suicide only cause of depression. And till now there is no proper lab test way for detecting depression. Generally, doctors are detecting depression by asking some knowledge-base questions. On the other hand, there are a good number of people using social media platforms right now, where they are sharing their daily experiences, emotion, and other activity with their friends. Twitter is one of the common social platforms and also popular for data collection. I was collecting these datasets from twitter based on some depressive words. I hope that this twitter datasets will help researchers to detect depression more precisely.

  • Authors: 
    Chen, Bernard;
    Publisher: IEEE DataPort

    Wine has been popular with the public for centuries; in the market, there are a variety of wines to choose from. Among all, Bordeaux, France, is considered as the most famous wine region in the world. In this paper, we try to understand Bordeaux wines made in the 21st century through Wineinformatics study. We developed and studied two datasets: the first dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine Official Classification, from 2000 to 2016. A total of 14,349 wine reviews are collected in the first dataset, and 1359 wine reviews in the second dataset. In order to understand the relation between wine quality and characteristics, Naïve Bayes classifier is applied to predict the qualities (90+/89−) of wines. Support Vector Machine (SVM) classifier is also applied as a comparison. In the first dataset, SVM classifier achieves the best accuracy of 86.97%; in the second dataset, Naïve Bayes classifier achieves the best accuracy of 84.62%. Precision, recall, and f-score are also used as our measures to describe the performance of our models. Meaningful features associate with high quality 21 century Bordeaux wines are able to be presented through this research paper.

  • Authors: 
    Chen, Bernard;
    Publisher: IEEE DataPort

    Wine has been popular with the public for centuries; in the market, there are a variety of wines to choose from. Among all, Bordeaux, France, is considered as the most famous wine region in the world. In this paper, we try to understand Bordeaux wines made in the 21st century through Wineinformatics study. We developed and studied two datasets: the first dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine Official Classification, from 2000 to 2016. A total of 14,349 wine reviews are collected in the first dataset, and 1359 wine reviews in the second dataset. In order to understand the relation between wine quality and characteristics, Naïve Bayes classifier is applied to predict the qualities (90+/89−) of wines. Support Vector Machine (SVM) classifier is also applied as a comparison. In the first dataset, SVM classifier achieves the best accuracy of 86.97%; in the second dataset, Naïve Bayes classifier achieves the best accuracy of 84.62%. Precision, recall, and f-score are also used as our measures to describe the performance of our models. Meaningful features associate with high quality 21 century Bordeaux wines are able to be presented through this research paper.

  • This dataset is a collection of images and their respective labels containing examples of multiple Brazilian coins, the primary purpose is to support the development of Computer Vision techniques for automatic detection of such objects, i.e., localization and classification tasks. It contains coins of R$ 0.05, 0.10, 0.25, 0.50 and 1.00 in Brazilian currency from the 2nd family, as manufactured by Casa da Moeda (http://www.casadamoeda.gov.br) since 2010. The samples were collected with a mobile phone and contain multiple coins placed upon a flat white A4 sheet of paper. Labels were obtained from a group with several individuals from both sexes and detailed reviewed. Each label has a circular or polygon shape and denotes the corresponding value in cents of the coin it is related to. This dataset is an improvement from Brazilian Coins (available at https://www.kaggle.com/lgmoneda/br-coins) where location labels were created.

  • Authors: 
    Hyun, Young Geun; Ko, Jindeuk; Han, Jeong Hyeon;
    Publisher: IEEE DataPort

    The age of Artificial Intelligence (AI) is coming. Since Natural Language Processing (NLP) is a core AI technology for communication between humans and devices, it is vital to understand technological trends. Early research on NLP focused on syntactic processing such as information extraction and subject modeling but later developed into the semantic-oriented analysis. To analyze technological trends concerning NLP, especially semantic analysis, patent data that contains objective and extensive information is analyzed. The analysis procedures follow text mining to collect patent information, pre-processing, and analysis in keyword frequency, keyword network, and time series. The results reveal that there is a difference in the direction of technological development as the core keywords are at different frequencies and centrality among countries. Besides, from the time series analysis for five intervals over 20 years, twelve keywords of the rising / falling trend are observed in the US, seven in the EU, and five in Korea. The greater number of keywords infer that the US underwent further technological progress as compared to other countries. Moreover, the technical linkage of the US-EU is presumed to be sturdier than the US-Korea based on the keyword similarity over time. The analysis results of this study can be used as valuable references for future technical predictions related to NLP.