Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
86 Research products, page 1 of 9

  • Digital Humanities and Cultural Heritage
  • Publications
  • Research data
  • Research software
  • Other research products
  • 2018-2022
  • Article
  • Scientometrics

10
arrow_drop_down
Relevance
arrow_drop_down
  • Closed Access
    Authors: 
    Mingyang Wang; Jiaqi Zhang; Shijia Jiao; Xiangrong Zhang; Na Zhu; Guangsheng Chen;
    Publisher: Springer Science and Business Media LLC

    Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.

  • Closed Access
    Authors: 
    YiJun Liu; Li Zhang; Xiaoli Lian;
    Publisher: Springer Science and Business Media LLC

    Keywords serving a dense summary of documents, are widely used in search engine and library to do information retrieval, content classification, speech recognition and automated text summarization. However, massive documents are lack of keywords, and the rapid generation of the large amount of content every day makes the human annotation really time-consuming. Lots of researches show that network-based approaches have remarkable performance for extracting text keywords. Traditionally, words are connected based upon their occurrence in documents. One recent work shows the significant influence of sentences on keywords extraction beyond the traditional methods only considering words. While in addition to words and sentences, chapters are the essential parts that are organized as the higher level semantic logic of the documents. Inspired by this idea, we therefore assume that chapters should contribute to the keyword extraction too. We further add the chapter factor to build a three-layer network model and propose a Word-Sentence-Chapter network-based approach for keywords extraction. Two experiments with Chinese and English documents respectively indicate that our approach outperforms the state of arts.

  • Publication . Preprint . Article . 2018
    Open Access

    Nature has recently published a Correspondence claiming the absence of fame biases in the editorial choice. The topic is interesting and deserves a deeper analysis than it was presented because the reported brief analysis and its conclusion are somewhat biased for many reasons, some of them are discussed here. Since the editorial assessment is a form of peer-review, the biases reported on external peer-reviews would, thus, apply to the editorial assessment, too. The biases would be proportional to the elitist level of a journal; the more elitist a journal, the more biased its decisions, unavoidably. The bias could be intentional or unintentional, conscious or subconscious, reflecting our imperfect human nature.

  • Closed Access
    Authors: 
    Andrés Carvallo; Denis Parra; Hans Lobel; Alvaro Soto;
    Publisher: Springer Science and Business Media LLC

    Document screening is a fundamental task within Evidence-based Medicine (EBM), a practice that provides scientific evidence to support medical decisions. Several approaches have tried to reduce physicians’ workload of screening and labeling vast amounts of documents to answer clinical questions. Previous works tried to semi-automate document screening, reporting promising results, but their evaluation was conducted on small datasets, which hinders generalization. Moreover, recent works in natural language processing have introduced neural language models, but none have compared their performance in EBM. In this paper, we evaluate the impact of several document representations such as TF-IDF along with neural language models (BioBERT, BERT, Word2Vec, and GloVe) on an active learning-based setting for document screening in EBM. Our goal is to reduce the number of documents that physicians need to label to answer clinical questions. We evaluate these methods using both a small challenging dataset (CLEF eHealth 2017) as well as a larger one but easier to rank (Epistemonikos). Our results indicate that word as well as textual neural embeddings always outperform the traditional TF-IDF representation. When comparing among neural and textual embeddings, in the CLEF eHealth dataset the models BERT and BioBERT yielded the best results. On the larger dataset, Epistemonikos, Word2Vec and BERT were the most competitive, showing that BERT was the most consistent model across different corpuses. In terms of active learning, an uncertainty sampling strategy combined with a logistic regression achieved the best performance overall, above other methods under evaluation, and in fewer iterations. Finally, we compared the results of evaluating our best models, trained using active learning, with other authors methods from CLEF eHealth, showing better results in terms of work saved for physicians in the document-screening task.

  • Open Access English
    Authors: 
    Zhiqi Wang; Ronald Rousseau;
    Publisher: Springer International Publishing
    Country: Belgium

    Abstract: The Yule-Simpson paradox refers to the fact that outcomes of comparisons between groups are reversed when groups are combined. Using Essential Sciences Indicators, a part of InCites (Clarivate), data for countries, it is shown that although the Yule-Simpson phenomenon in citation analysis and research evaluation is not common, it isn't extremely rare either. The Yule-Simpson paradox is a phenomenon one should be aware of, otherwise one may encounter unforeseen surprises in scientometric studies.

  • Publication . Conference object . Other literature type . Article . 2020
    Open Access
    Authors: 
    Wieland, Martin; Gorraiz, Juan;
    Publisher: Springer Science and Business Media LLC
    Country: Austria

    AbstractFrom a historical point of view, Rome and especially the University of La Sapienza, are closely linked to two geniuses of Baroque art: Bernini and Borromini. In this study, we analyze the rivalry between them from a scientometric perspective. This study also serves as a basis for exploring which data sources may be appropriate for broad impact assessment of individuals and/or celebrities. We pay special attention to encyclopaedias, library catalogues and other databases or types of publications that are not normally used for this purpose. The results show that some sources such as Wikipedia are not exploited according to the possibilities they offer, especially those related to different languages and cultures. Moreover, analyses are often reduced to a minimum number of data sources, which can distort the relevance of the outcome. Our results show that other sources normally not considered for this purpose, like JSTOR, PQDT, Google Scholar, Catalogue Holdings, etc. can provide more relevant or abundant information than the typically used Web of Science Core Collection and Scopus. Finally, we also contrast opportunities and limitation of old and new (YouTube, Twitter) data sources (particularly the aspects quality and accuracy of the search methods). Much room for improvement has been identified in order to use data sources more efficiently and with higher accuracy.

  • Closed Access
    Authors: 
    Imran Ihsan; M. Abdul Qadir;
    Publisher: Springer Science and Business Media LLC

    In recent scientific advances, Artificial Intelligence and Natural Language Processing are the major contributors to classifying documents and extracting information. Classifying citations in different classes have gathered a lot of attention due to the large volume of citations available in different digital libraries. Typical citation classification uses sentiment analysis, where various techniques are applied to citations texts to mainly classify them in “Positive”, “Negative” and “Neutral” sentiments. However, there can be innumerable reasons why an author selects another research for citation. Citations’ Context and Reasons Ontology—CCRO uses a clear scientific method to articulate eight basic reasons for citing by using an iterative process of sentiment analysis, collaborative meanings, and experts' opinions. Using CCRO, this research paper adopts an ontology-based approach to extract citation's reasons and instantiate ontology classes and properties on two different corpora of citation sentences. One corpus of citation sentences is a publicly available dataset, while the other is our own manually curated. The process uses a two-step approach. The first part is an interface to manually annotate each citation text in the selected corpora on CCRO properties. A team of carefully selected annotators has annotated each citation to achieve a high inter-annotator agreement. The second part focuses on the automatic extraction of these reasons. Using Natural Language Processing, Mapping Graph, and Reporting Verb in a citation sentence, citation's reason is extracted and mapped onto a CCRO property. After comparing both manual and automatic mapping, accuracy is calculated. Based on experiments and results, accuracy is calculated for both publicly available and own corpora of citation sentences.

  • Open Access
    Authors: 
    Iman Tahamtan; Lutz Bornmann;
    Publisher: arXiv

    The purpose of this paper is to update the review of Bornmann and Daniel (2008) presenting a narrative review of studies on citations in scientific documents. The current review covers 41 studies published between 2006 and 2018. Bornmann and Daniel (2008) focused on earlier years. The current review describes the (new) studies on citation content and context analyses as well as the studies that explore the citation motivation of scholars through surveys or interviews. One focus in this paper is on the technical developments in the last decade, such as the richer meta-data available and machine-readable formats of scientific papers. These developments have resulted in citation context analyses of large datasets in comprehensive studies (which was not possible previously). Many studies in recent years have used computational and machine learning techniques to determine citation functions and polarities, some of which have attempted to overcome the methodological weaknesses of previous studies. The automated recognition of citation functions seems to have the potential to greatly enhance citation indices and information retrieval capabilities. Our review of the empirical studies demonstrates that a paper may be cited for very different scientific and non-scientific reasons. This result accords with the finding by Bornmann and Daniel (2008). The current review also shows that to better understand the relationship between citing and cited documents, a variety of features should be analyzed, primarily the citation context, the semantics and linguistic patterns in citations, citation locations within the citing document, and citation polarity (negative, neutral, positive). Comment: 56 pages, 4 figures, 11 tables

  • Closed Access
    Authors: 
    Kiran Sharma;
    Publisher: Springer Science and Business Media LLC

    The growth of the retraction databases reveals the disturbing trend in science and also the rising trend of citations of retracted papers is a serious concern. The objective of the study is to investigate the patterns of retractions through the team size and retracted citations. The publication records of 12,231 retracted papers indexed by Web of Science (WoS) are analyzed to investigate (i) the patterns of retraction associated with collaboration and team size; and (ii) the impact of retracted papers on the papers that are citing the retracted papers (retracted citations). The study demonstrates the collaboration patterns of retracted publications where 61.5% of authors have only one and 24.6% have two retracted papers; however, 2% of authors have more than retracted papers. Also, the temporal evolution of the team size reveals that teams smaller in size have more retractions. The impact of citing retracted papers reveals that 55.2% of retracted papers have been cited at least once. 1/4th of the citations to the retracted papers are self-citations which themselves are retractions. On average 71.4% citations are the non-retracted citations and 28.6% citations are retracted citations which are mostly the self-citations. Last, the variation in average team size and average retracted citations in various research areas (having high retraction) is presented. Retracted publications in high-impact journals are highly cited.

  • Open Access
    Authors: 
    Mei Hsiu-Ching Ho; John S. Liu;
    Publisher: Springer Science and Business Media LLC

    Scholars all over the world have produced a large body of COVID-19 literature in an exceptionally short period after the outbreak of this rapidly-spreading virus. An analysis of the literature accumulated in the first 150 days hints that the rapid knowledge accumulation in its early-stage development was expedited through a wide variety of journal platforms, a sense and pressure of national urgency, and inspiration from journal editorials.