Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
22 Research products, page 1 of 3

  • Digital Humanities and Cultural Heritage
  • Research software
  • Code Ocean

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Open Access English
    Authors: 
    Jinhang Jiang; Srinivasan, Karthik;
    Publisher: Code Ocean

    MoreThanSentiments (Jiang and Srinivasan, 2022) is a python library written to help researchers calculate Boilerplate (Lang and Stice-Lawrence, 2015), Redundancy (Cazier and Pfeiffer, 2017), Specificity (Hope et al., 2016), Relative Prevalence (Blankespoor, 2019), etc. It is inspired by the idea that properly quantifying the text structure will also help researchers extract tons of meaningful information. And this domain-independent package is easy to be implemented in various projects for text quantification tasks.

  • English
    Authors: 
    Amirhosein Bodaghi;
    Publisher: Code Ocean

    This code gets a number of tweets as the input and delivers the semantic graph of relationships between entities of those tweets' text. To this aim first it does a series of text cleanings, and then proceeds with entity extraction and resolutions which come in multiple stages. Finally, the code creates the graph in which nodes represent the entities and the link between them indicates the co-concurrency of those entities in at least one tweet of the input data.

  • English
    Authors: 
    Jihye Moon; Posada-Quintero, Hugo F.; Chon, Ki H.;
    Publisher: Code Ocean

    We have developed a literature embedding model to identify significant cardiovascular disease (CVD) risk factors and associated information. Our model that trained using literature data and retrieve CVD risk factors and significant information related to a given query. Our model can be used with CVD prediction on cohort data as feature selection (FS) and dimensionality reduction (DR) tasks. This capsule provides all procedures for literature data collection/pre-processing, literature model training process, CVD risk factor identifications, and FS and DR applications for CVD prediction on cohort data.

  • English
    Authors: 
    Rodrawangpai, Ben; Witawat Daungjaiboon;
    Publisher: Code Ocean

    We propose a new text classification model by adding layer normalization, followed by Dropout layers to the pre-trained transformer model. This code is a part of our paper entitled "Improving text classification with Transformers and Layer Normalization" which is to be published in the Elsevier journal of "Machine Learning with Applications".

  • English
    Authors: 
    Yahav, Inbal; Chriqui, Avihay;
    Publisher: Code Ocean

    Sentiment analysis of user-generated content (UGC) can provide valuable information across numerous domains, including marketing, psychology, and public health. Currently, there are very few Hebrew models for natural language processing in general, and for sentiment analysis in particular; indeed, it is not straightforward to develop such models because Hebrew is a Morphologically Rich Language (MRL) with challenging characteristics. Moreover, the only available Hebrew sentiment analysis model, based on a recurrent neural network, was developed for polarity analysis (classifying text as “positive”, “negative”, or neutral) and was not used for detection of finer-grained emotions (e.g., anger, fear, joy). To address these gaps, this paper introduces HeBERT and HebEMO. HeBERT is a Transformer-based model for modern Hebrew text, which relies on a BERT (Bidirectional Encoder Representations for Transformers) architecture. BERT has been shown to outperform alternative architectures in sentiment analysis, and is suggested to be particularly appropriate for MRLs. Analyzing multiple BERT specifications, we come up with a language model that outperforms all existing Hebrew alternatives on multiple language tasks.

  • English
    Authors: 
    Zanyar Mohammady; Safari, Leila;
    Publisher: Code Ocean

    These codes are related to the details of the article "A Semi-supervised Method to Generate Persian Dataset for Suggestions Classification", which has not been published yet. The following works have been done in this article. ��� A general two-step method for tagging data to classify Persian suggestions ��� Standard guide for generating datasets for other NLP tasks in Persian ��� New Persian data set for suggestion classification tasks ��� A basis for classifying suggestions in the Persian data set

  • English
    Authors: 
    Benchimol, Jonathan; Kazinnik, Sophia; Saadon, Yossi;
    Publisher: Code Ocean

    We review several existing text analysis methodologies and explain their formal application processes using the open-source software R and relevant packages. Several text mining applications to analyze central bank texts are presented.

  • English
    Authors: 
    Xiaofeng Liu;
    Publisher: Code Ocean

    A hybrid embedding-based text representation for HMTC

  • English
    Authors: 
    Pisa��ovic, Ivo; Franti��ek Da��ena; Proch��zka, David; Jani��, V��t;
    Publisher: Code Ocean

    Every larger organisation must establish a set of normative documents to control its processes and describe solutions to common problems. These documents are usually formally written and hard to read. This leads to the necessity of different customer services. Nowadays, a lot of companies are developing chatbots to automate first-line customer support. If a company does not have a large question-answer dataset to build a chatbot, the answers can be automatically answered directly from the documents. However, we found that the automatic answering usually does not work well on the normative documents. In this paper, we describe a novel method for preprocessing of normative documents in order to use them for such automatic question answering. Our method efficiently exploits the strict document structure that is typical for normative documents. We increased the recall from 35% to 84% (for paragraph-size answers) on selected normative documents from university and bank domains.

  • English
    Authors: 
    Kanakaris, Nikos; Giarelis, Nikolaos; Siachos, Ilias; Karacapilidis, Nikos;
    Publisher: Code Ocean

    This paper employs techniques and algorithms from the fields of natural lan-guage processing, graph representation learning and word embeddings to assistproject managers in the task of personnel selection. To do so, our approachinitially represents multiple textual documents as a single graph. Then, it com-putes word embeddings through representation learning on graphs and performsfeature selection. Finally, it builds a classification model that is able to estimatehow qualified a candidate employee is to work on a given task, taking as inputonly the descriptions of the tasks and a list of word embeddings. Our approachdiffers from the existing ones in that it does not require the calculation of keyperformance indicators or any other form of structured data in order to operateproperly. For our experiments, we retrieved data from the Jira issue trackingsystem of the Apache Software Foundation. The evaluation results show, inmost cases, an increase of 0.43% in the accuracy of the proposed classificationmodels when compared against a widely-adopted baseline method, while theirvalidation loss is significantly decreased by 65.54%

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
22 Research products, page 1 of 3
  • Open Access English
    Authors: 
    Jinhang Jiang; Srinivasan, Karthik;
    Publisher: Code Ocean

    MoreThanSentiments (Jiang and Srinivasan, 2022) is a python library written to help researchers calculate Boilerplate (Lang and Stice-Lawrence, 2015), Redundancy (Cazier and Pfeiffer, 2017), Specificity (Hope et al., 2016), Relative Prevalence (Blankespoor, 2019), etc. It is inspired by the idea that properly quantifying the text structure will also help researchers extract tons of meaningful information. And this domain-independent package is easy to be implemented in various projects for text quantification tasks.

  • English
    Authors: 
    Amirhosein Bodaghi;
    Publisher: Code Ocean

    This code gets a number of tweets as the input and delivers the semantic graph of relationships between entities of those tweets' text. To this aim first it does a series of text cleanings, and then proceeds with entity extraction and resolutions which come in multiple stages. Finally, the code creates the graph in which nodes represent the entities and the link between them indicates the co-concurrency of those entities in at least one tweet of the input data.

  • English
    Authors: 
    Jihye Moon; Posada-Quintero, Hugo F.; Chon, Ki H.;
    Publisher: Code Ocean

    We have developed a literature embedding model to identify significant cardiovascular disease (CVD) risk factors and associated information. Our model that trained using literature data and retrieve CVD risk factors and significant information related to a given query. Our model can be used with CVD prediction on cohort data as feature selection (FS) and dimensionality reduction (DR) tasks. This capsule provides all procedures for literature data collection/pre-processing, literature model training process, CVD risk factor identifications, and FS and DR applications for CVD prediction on cohort data.

  • English
    Authors: 
    Rodrawangpai, Ben; Witawat Daungjaiboon;
    Publisher: Code Ocean

    We propose a new text classification model by adding layer normalization, followed by Dropout layers to the pre-trained transformer model. This code is a part of our paper entitled "Improving text classification with Transformers and Layer Normalization" which is to be published in the Elsevier journal of "Machine Learning with Applications".

  • English
    Authors: 
    Yahav, Inbal; Chriqui, Avihay;
    Publisher: Code Ocean

    Sentiment analysis of user-generated content (UGC) can provide valuable information across numerous domains, including marketing, psychology, and public health. Currently, there are very few Hebrew models for natural language processing in general, and for sentiment analysis in particular; indeed, it is not straightforward to develop such models because Hebrew is a Morphologically Rich Language (MRL) with challenging characteristics. Moreover, the only available Hebrew sentiment analysis model, based on a recurrent neural network, was developed for polarity analysis (classifying text as “positive”, “negative”, or neutral) and was not used for detection of finer-grained emotions (e.g., anger, fear, joy). To address these gaps, this paper introduces HeBERT and HebEMO. HeBERT is a Transformer-based model for modern Hebrew text, which relies on a BERT (Bidirectional Encoder Representations for Transformers) architecture. BERT has been shown to outperform alternative architectures in sentiment analysis, and is suggested to be particularly appropriate for MRLs. Analyzing multiple BERT specifications, we come up with a language model that outperforms all existing Hebrew alternatives on multiple language tasks.

  • English
    Authors: 
    Zanyar Mohammady; Safari, Leila;
    Publisher: Code Ocean

    These codes are related to the details of the article "A Semi-supervised Method to Generate Persian Dataset for Suggestions Classification", which has not been published yet. The following works have been done in this article. ��� A general two-step method for tagging data to classify Persian suggestions ��� Standard guide for generating datasets for other NLP tasks in Persian ��� New Persian data set for suggestion classification tasks ��� A basis for classifying suggestions in the Persian data set

  • English
    Authors: 
    Benchimol, Jonathan; Kazinnik, Sophia; Saadon, Yossi;
    Publisher: Code Ocean

    We review several existing text analysis methodologies and explain their formal application processes using the open-source software R and relevant packages. Several text mining applications to analyze central bank texts are presented.

  • English
    Authors: 
    Xiaofeng Liu;
    Publisher: Code Ocean

    A hybrid embedding-based text representation for HMTC

  • English
    Authors: 
    Pisa��ovic, Ivo; Franti��ek Da��ena; Proch��zka, David; Jani��, V��t;
    Publisher: Code Ocean

    Every larger organisation must establish a set of normative documents to control its processes and describe solutions to common problems. These documents are usually formally written and hard to read. This leads to the necessity of different customer services. Nowadays, a lot of companies are developing chatbots to automate first-line customer support. If a company does not have a large question-answer dataset to build a chatbot, the answers can be automatically answered directly from the documents. However, we found that the automatic answering usually does not work well on the normative documents. In this paper, we describe a novel method for preprocessing of normative documents in order to use them for such automatic question answering. Our method efficiently exploits the strict document structure that is typical for normative documents. We increased the recall from 35% to 84% (for paragraph-size answers) on selected normative documents from university and bank domains.

  • English
    Authors: 
    Kanakaris, Nikos; Giarelis, Nikolaos; Siachos, Ilias; Karacapilidis, Nikos;
    Publisher: Code Ocean

    This paper employs techniques and algorithms from the fields of natural lan-guage processing, graph representation learning and word embeddings to assistproject managers in the task of personnel selection. To do so, our approachinitially represents multiple textual documents as a single graph. Then, it com-putes word embeddings through representation learning on graphs and performsfeature selection. Finally, it builds a classification model that is able to estimatehow qualified a candidate employee is to work on a given task, taking as inputonly the descriptions of the tasks and a list of word embeddings. Our approachdiffers from the existing ones in that it does not require the calculation of keyperformance indicators or any other form of structured data in order to operateproperly. For our experiments, we retrieved data from the Jira issue trackingsystem of the Apache Software Foundation. The evaluation results show, inmost cases, an increase of 0.43% in the accuracy of the proposed classificationmodels when compared against a widely-adopted baseline method, while theirvalidation loss is significantly decreased by 65.54%