- home
- Advanced Search
22 Research products, page 1 of 3
Loading
- Research software . 2022Open Access EnglishAuthors:Jinhang Jiang; Srinivasan, Karthik;Jinhang Jiang; Srinivasan, Karthik;Publisher: Code Ocean
MoreThanSentiments (Jiang and Srinivasan, 2022) is a python library written to help researchers calculate Boilerplate (Lang and Stice-Lawrence, 2015), Redundancy (Cazier and Pfeiffer, 2017), Specificity (Hope et al., 2016), Relative Prevalence (Blankespoor, 2019), etc. It is inspired by the idea that properly quantifying the text structure will also help researchers extract tons of meaningful information. And this domain-independent package is easy to be implemented in various projects for text quantification tasks.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Amirhosein Bodaghi;Amirhosein Bodaghi;Publisher: Code Ocean
This code gets a number of tweets as the input and delivers the semantic graph of relationships between entities of those tweets' text. To this aim first it does a series of text cleanings, and then proceeds with entity extraction and resolutions which come in multiple stages. Finally, the code creates the graph in which nodes represent the entities and the link between them indicates the co-concurrency of those entities in at least one tweet of the input data.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Jihye Moon; Posada-Quintero, Hugo F.; Chon, Ki H.;Jihye Moon; Posada-Quintero, Hugo F.; Chon, Ki H.;Publisher: Code Ocean
We have developed a literature embedding model to identify significant cardiovascular disease (CVD) risk factors and associated information. Our model that trained using literature data and retrieve CVD risk factors and significant information related to a given query. Our model can be used with CVD prediction on cohort data as feature selection (FS) and dimensionality reduction (DR) tasks. This capsule provides all procedures for literature data collection/pre-processing, literature model training process, CVD risk factor identifications, and FS and DR applications for CVD prediction on cohort data.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Rodrawangpai, Ben; Witawat Daungjaiboon;Rodrawangpai, Ben; Witawat Daungjaiboon;Publisher: Code Ocean
We propose a new text classification model by adding layer normalization, followed by Dropout layers to the pre-trained transformer model. This code is a part of our paper entitled "Improving text classification with Transformers and Layer Normalization" which is to be published in the Elsevier journal of "Machine Learning with Applications".
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Yahav, Inbal; Chriqui, Avihay;Yahav, Inbal; Chriqui, Avihay;Publisher: Code Ocean
Sentiment analysis of user-generated content (UGC) can provide valuable information across numerous domains, including marketing, psychology, and public health. Currently, there are very few Hebrew models for natural language processing in general, and for sentiment analysis in particular; indeed, it is not straightforward to develop such models because Hebrew is a Morphologically Rich Language (MRL) with challenging characteristics. Moreover, the only available Hebrew sentiment analysis model, based on a recurrent neural network, was developed for polarity analysis (classifying text as “positive”, “negative”, or neutral) and was not used for detection of finer-grained emotions (e.g., anger, fear, joy). To address these gaps, this paper introduces HeBERT and HebEMO. HeBERT is a Transformer-based model for modern Hebrew text, which relies on a BERT (Bidirectional Encoder Representations for Transformers) architecture. BERT has been shown to outperform alternative architectures in sentiment analysis, and is suggested to be particularly appropriate for MRLs. Analyzing multiple BERT specifications, we come up with a language model that outperforms all existing Hebrew alternatives on multiple language tasks.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Zanyar Mohammady; Safari, Leila;Zanyar Mohammady; Safari, Leila;Publisher: Code Ocean
These codes are related to the details of the article "A Semi-supervised Method to Generate Persian Dataset for Suggestions Classification", which has not been published yet. The following works have been done in this article. ��� A general two-step method for tagging data to classify Persian suggestions ��� Standard guide for generating datasets for other NLP tasks in Persian ��� New Persian data set for suggestion classification tasks ��� A basis for classifying suggestions in the Persian data set
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Benchimol, Jonathan; Kazinnik, Sophia; Saadon, Yossi;Benchimol, Jonathan; Kazinnik, Sophia; Saadon, Yossi;Publisher: Code Ocean
We review several existing text analysis methodologies and explain their formal application processes using the open-source software R and relevant packages. Several text mining applications to analyze central bank texts are presented.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Xiaofeng Liu;Xiaofeng Liu;Publisher: Code Ocean
A hybrid embedding-based text representation for HMTC
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2021EnglishAuthors:Pisa��ovic, Ivo; Franti��ek Da��ena; Proch��zka, David; Jani��, V��t;Pisa��ovic, Ivo; Franti��ek Da��ena; Proch��zka, David; Jani��, V��t;Publisher: Code Ocean
Every larger organisation must establish a set of normative documents to control its processes and describe solutions to common problems. These documents are usually formally written and hard to read. This leads to the necessity of different customer services. Nowadays, a lot of companies are developing chatbots to automate first-line customer support. If a company does not have a large question-answer dataset to build a chatbot, the answers can be automatically answered directly from the documents. However, we found that the automatic answering usually does not work well on the normative documents. In this paper, we describe a novel method for preprocessing of normative documents in order to use them for such automatic question answering. Our method efficiently exploits the strict document structure that is typical for normative documents. We increased the recall from 35% to 84% (for paragraph-size answers) on selected normative documents from university and bank domains.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2021EnglishAuthors:Kanakaris, Nikos; Giarelis, Nikolaos; Siachos, Ilias; Karacapilidis, Nikos;Kanakaris, Nikos; Giarelis, Nikolaos; Siachos, Ilias; Karacapilidis, Nikos;Publisher: Code Ocean
This paper employs techniques and algorithms from the fields of natural lan-guage processing, graph representation learning and word embeddings to assistproject managers in the task of personnel selection. To do so, our approachinitially represents multiple textual documents as a single graph. Then, it com-putes word embeddings through representation learning on graphs and performsfeature selection. Finally, it builds a classification model that is able to estimatehow qualified a candidate employee is to work on a given task, taking as inputonly the descriptions of the tasks and a list of word embeddings. Our approachdiffers from the existing ones in that it does not require the calculation of keyperformance indicators or any other form of structured data in order to operateproperly. For our experiments, we retrieved data from the Jira issue trackingsystem of the Apache Software Foundation. The evaluation results show, inmost cases, an increase of 0.43% in the accuracy of the proposed classificationmodels when compared against a widely-adopted baseline method, while theirvalidation loss is significantly decreased by 65.54%
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.
22 Research products, page 1 of 3
Loading
- Research software . 2022Open Access EnglishAuthors:Jinhang Jiang; Srinivasan, Karthik;Jinhang Jiang; Srinivasan, Karthik;Publisher: Code Ocean
MoreThanSentiments (Jiang and Srinivasan, 2022) is a python library written to help researchers calculate Boilerplate (Lang and Stice-Lawrence, 2015), Redundancy (Cazier and Pfeiffer, 2017), Specificity (Hope et al., 2016), Relative Prevalence (Blankespoor, 2019), etc. It is inspired by the idea that properly quantifying the text structure will also help researchers extract tons of meaningful information. And this domain-independent package is easy to be implemented in various projects for text quantification tasks.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Amirhosein Bodaghi;Amirhosein Bodaghi;Publisher: Code Ocean
This code gets a number of tweets as the input and delivers the semantic graph of relationships between entities of those tweets' text. To this aim first it does a series of text cleanings, and then proceeds with entity extraction and resolutions which come in multiple stages. Finally, the code creates the graph in which nodes represent the entities and the link between them indicates the co-concurrency of those entities in at least one tweet of the input data.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Jihye Moon; Posada-Quintero, Hugo F.; Chon, Ki H.;Jihye Moon; Posada-Quintero, Hugo F.; Chon, Ki H.;Publisher: Code Ocean
We have developed a literature embedding model to identify significant cardiovascular disease (CVD) risk factors and associated information. Our model that trained using literature data and retrieve CVD risk factors and significant information related to a given query. Our model can be used with CVD prediction on cohort data as feature selection (FS) and dimensionality reduction (DR) tasks. This capsule provides all procedures for literature data collection/pre-processing, literature model training process, CVD risk factor identifications, and FS and DR applications for CVD prediction on cohort data.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Rodrawangpai, Ben; Witawat Daungjaiboon;Rodrawangpai, Ben; Witawat Daungjaiboon;Publisher: Code Ocean
We propose a new text classification model by adding layer normalization, followed by Dropout layers to the pre-trained transformer model. This code is a part of our paper entitled "Improving text classification with Transformers and Layer Normalization" which is to be published in the Elsevier journal of "Machine Learning with Applications".
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Yahav, Inbal; Chriqui, Avihay;Yahav, Inbal; Chriqui, Avihay;Publisher: Code Ocean
Sentiment analysis of user-generated content (UGC) can provide valuable information across numerous domains, including marketing, psychology, and public health. Currently, there are very few Hebrew models for natural language processing in general, and for sentiment analysis in particular; indeed, it is not straightforward to develop such models because Hebrew is a Morphologically Rich Language (MRL) with challenging characteristics. Moreover, the only available Hebrew sentiment analysis model, based on a recurrent neural network, was developed for polarity analysis (classifying text as “positive”, “negative”, or neutral) and was not used for detection of finer-grained emotions (e.g., anger, fear, joy). To address these gaps, this paper introduces HeBERT and HebEMO. HeBERT is a Transformer-based model for modern Hebrew text, which relies on a BERT (Bidirectional Encoder Representations for Transformers) architecture. BERT has been shown to outperform alternative architectures in sentiment analysis, and is suggested to be particularly appropriate for MRLs. Analyzing multiple BERT specifications, we come up with a language model that outperforms all existing Hebrew alternatives on multiple language tasks.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Zanyar Mohammady; Safari, Leila;Zanyar Mohammady; Safari, Leila;Publisher: Code Ocean
These codes are related to the details of the article "A Semi-supervised Method to Generate Persian Dataset for Suggestions Classification", which has not been published yet. The following works have been done in this article. ��� A general two-step method for tagging data to classify Persian suggestions ��� Standard guide for generating datasets for other NLP tasks in Persian ��� New Persian data set for suggestion classification tasks ��� A basis for classifying suggestions in the Persian data set
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Benchimol, Jonathan; Kazinnik, Sophia; Saadon, Yossi;Benchimol, Jonathan; Kazinnik, Sophia; Saadon, Yossi;Publisher: Code Ocean
We review several existing text analysis methodologies and explain their formal application processes using the open-source software R and relevant packages. Several text mining applications to analyze central bank texts are presented.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2022EnglishAuthors:Xiaofeng Liu;Xiaofeng Liu;Publisher: Code Ocean
A hybrid embedding-based text representation for HMTC
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2021EnglishAuthors:Pisa��ovic, Ivo; Franti��ek Da��ena; Proch��zka, David; Jani��, V��t;Pisa��ovic, Ivo; Franti��ek Da��ena; Proch��zka, David; Jani��, V��t;Publisher: Code Ocean
Every larger organisation must establish a set of normative documents to control its processes and describe solutions to common problems. These documents are usually formally written and hard to read. This leads to the necessity of different customer services. Nowadays, a lot of companies are developing chatbots to automate first-line customer support. If a company does not have a large question-answer dataset to build a chatbot, the answers can be automatically answered directly from the documents. However, we found that the automatic answering usually does not work well on the normative documents. In this paper, we describe a novel method for preprocessing of normative documents in order to use them for such automatic question answering. Our method efficiently exploits the strict document structure that is typical for normative documents. We increased the recall from 35% to 84% (for paragraph-size answers) on selected normative documents from university and bank domains.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research software . 2021EnglishAuthors:Kanakaris, Nikos; Giarelis, Nikolaos; Siachos, Ilias; Karacapilidis, Nikos;Kanakaris, Nikos; Giarelis, Nikolaos; Siachos, Ilias; Karacapilidis, Nikos;Publisher: Code Ocean
This paper employs techniques and algorithms from the fields of natural lan-guage processing, graph representation learning and word embeddings to assistproject managers in the task of personnel selection. To do so, our approachinitially represents multiple textual documents as a single graph. Then, it com-putes word embeddings through representation learning on graphs and performsfeature selection. Finally, it builds a classification model that is able to estimatehow qualified a candidate employee is to work on a given task, taking as inputonly the descriptions of the tasks and a list of word embeddings. Our approachdiffers from the existing ones in that it does not require the calculation of keyperformance indicators or any other form of structured data in order to operateproperly. For our experiments, we retrieved data from the Jira issue trackingsystem of the Apache Software Foundation. The evaluation results show, inmost cases, an increase of 0.43% in the accuracy of the proposed classificationmodels when compared against a widely-adopted baseline method, while theirvalidation loss is significantly decreased by 65.54%
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.