Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
4 Research products, page 1 of 1

  • Digital Humanities and Cultural Heritage
  • Publications
  • Research data
  • Research software
  • Other research products
  • 2018-2022
  • Open Access
  • Article
  • IT
  • EU
  • Scientometrics

Date (most recent)
arrow_drop_down
  • Open Access
    Authors: 
    Kun Sun; Haitao Liu; Wenxin Xiong;
    Publisher: Zenodo
    Project: EC | WIDE (742545)

    AbstractScientific writings, as one essential part of human culture, have evolved over centuries into their current form. Knowing how scientific writings evolved is particularly helpful in understanding how trends in scientific culture developed. It also allows us to better understand how scientific culture was interwoven with human culture generally. The availability of massive digitized texts and the progress in computational technologies today provide us with a convenient and credible way to discern the evolutionary patterns in scientific writings by examining the diachronic linguistic changes. The linguistic changes in scientific writings reflect the genre shifts that took place with historical changes in science and scientific writings. This study investigates a general evolutionary linguistic pattern in scientific writings. It does so by merging two credible computational methods: relative entropy; word-embedding concreteness and imageability. It thus creates a novel quantitative methodology and applies this to the examination of diachronic changes in the Philosophical Transactions of Royal Society (PTRS, 1665–1869). The data from two computational approaches can be well mapped to support the argument that this journal followed the evolutionary trend of increasing professionalization and specialization. But it also shows that language use in this journal was greatly influenced by historical events and other socio-cultural factors. This study, as a “culturomic” approach, demonstrates that the linguistic evolutionary patterns in scientific discourse have been interrupted by external factors even though this scientific discourse would likely have cumulatively developed into a professional and specialized genre. The approaches proposed by this study can make a great contribution to full-text analysis in scientometrics.

  • Open Access English
    Authors: 
    Stefano Mammola; Diego Fontaneto; Alejandro Martínez; Filipe Chichorro;
    Countries: Finland, Italy

    AbstractMany believe that the quality of a scientific publication is as good as the science it cites. However, quantifications of how features of reference lists affect citations remain sparse. We examined seven numerical characteristics of reference lists of 50,878 research articles published in 17 ecological journals between 1997 and 2017. Over this period, significant changes occurred in reference lists’ features. On average, more recent papers have longer reference lists and cite more high Impact Factor papers and fewer non-journal publications. We also show that highly cited articles across the ecological literature have longer reference lists, cite more recent and impactful references, and include more self-citations. Conversely, the proportion of ‘classic’ papers and non-journal publications cited, as well as the temporal span of the reference list, have no significant influence on articles’ citations. From this analysis, we distill a recipe for crafting impactful reference lists, at least in ecology.

  • Open Access English
    Authors: 
    Moreno La Quatra; Luca Cagliero; Elena Baralis;
    Publisher: Springer
    Country: Italy

    The ever-increasing number of published scientific articles has prompted the need for automated, data-driven approaches to summarizing the content of scientific articles. The Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm 2019) has recently fostered the study and development of new text mining and machine learning solutions to the summarization problem customized to the academic domain. In CL-SciSumm, a Reference Paper (RP) is associated with a set of Citing Papers (CPs), all containing citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP. The task of identifying the spans of text in the RP that most accurately reflect the citance is addressed using supervised approaches. This paper proposes a new, more effective solution to the CL-SciSumm discourse facet classification task, which entails identifying for each cited text span what facet of the paper it belongs to from a predefined set of facets. It proposes also to extend the set of traditional CL-SciSumm tasks with a new one, namely the discourse facet summarization task. The idea behind is to extract facet-specific descriptions of each RP consisting of a fixed-length collection of RP’s text spans. To tackle both the standard and the new tasks, we propose machine learning supported solutions based on the extraction of a selection of discriminating words, called pivot words. Predictive features based on pivot words are shown to be of great importance to rate the pertinence and relevance of a text span to a given facet. The newly proposed facet classification method performs significantly better than the best performing CL-SciSumm 2019 participant (i.e., the classification accuracy has increased by + 8%), whereas regression methods achieved promising results for the newly proposed summarization task.

  • Open Access English
    Authors: 
    Camil Demetrescu; Andrea Ribichini; Marco Schaerf;
    Country: Italy
    Project: EC | SecondHands (643950)

    We investigate the accuracy of how author names are reported in bibliographic records excerpted from four prominent sources: WoS, Scopus, PubMed, and CrossRef. We take as a case study 44,549 publications stored in the internal database of Sapienza University of Rome, one of the largest universities in Europe. While our results indicate generally good accuracy for all bibliographic data sources considered, we highlight a number of issues that undermine the accuracy for certain classes of author names, including compound names and names with diacritics, which are common features to Italian and other Western languages.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
4 Research products, page 1 of 1
  • Open Access
    Authors: 
    Kun Sun; Haitao Liu; Wenxin Xiong;
    Publisher: Zenodo
    Project: EC | WIDE (742545)

    AbstractScientific writings, as one essential part of human culture, have evolved over centuries into their current form. Knowing how scientific writings evolved is particularly helpful in understanding how trends in scientific culture developed. It also allows us to better understand how scientific culture was interwoven with human culture generally. The availability of massive digitized texts and the progress in computational technologies today provide us with a convenient and credible way to discern the evolutionary patterns in scientific writings by examining the diachronic linguistic changes. The linguistic changes in scientific writings reflect the genre shifts that took place with historical changes in science and scientific writings. This study investigates a general evolutionary linguistic pattern in scientific writings. It does so by merging two credible computational methods: relative entropy; word-embedding concreteness and imageability. It thus creates a novel quantitative methodology and applies this to the examination of diachronic changes in the Philosophical Transactions of Royal Society (PTRS, 1665–1869). The data from two computational approaches can be well mapped to support the argument that this journal followed the evolutionary trend of increasing professionalization and specialization. But it also shows that language use in this journal was greatly influenced by historical events and other socio-cultural factors. This study, as a “culturomic” approach, demonstrates that the linguistic evolutionary patterns in scientific discourse have been interrupted by external factors even though this scientific discourse would likely have cumulatively developed into a professional and specialized genre. The approaches proposed by this study can make a great contribution to full-text analysis in scientometrics.

  • Open Access English
    Authors: 
    Stefano Mammola; Diego Fontaneto; Alejandro Martínez; Filipe Chichorro;
    Countries: Finland, Italy

    AbstractMany believe that the quality of a scientific publication is as good as the science it cites. However, quantifications of how features of reference lists affect citations remain sparse. We examined seven numerical characteristics of reference lists of 50,878 research articles published in 17 ecological journals between 1997 and 2017. Over this period, significant changes occurred in reference lists’ features. On average, more recent papers have longer reference lists and cite more high Impact Factor papers and fewer non-journal publications. We also show that highly cited articles across the ecological literature have longer reference lists, cite more recent and impactful references, and include more self-citations. Conversely, the proportion of ‘classic’ papers and non-journal publications cited, as well as the temporal span of the reference list, have no significant influence on articles’ citations. From this analysis, we distill a recipe for crafting impactful reference lists, at least in ecology.

  • Open Access English
    Authors: 
    Moreno La Quatra; Luca Cagliero; Elena Baralis;
    Publisher: Springer
    Country: Italy

    The ever-increasing number of published scientific articles has prompted the need for automated, data-driven approaches to summarizing the content of scientific articles. The Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm 2019) has recently fostered the study and development of new text mining and machine learning solutions to the summarization problem customized to the academic domain. In CL-SciSumm, a Reference Paper (RP) is associated with a set of Citing Papers (CPs), all containing citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP. The task of identifying the spans of text in the RP that most accurately reflect the citance is addressed using supervised approaches. This paper proposes a new, more effective solution to the CL-SciSumm discourse facet classification task, which entails identifying for each cited text span what facet of the paper it belongs to from a predefined set of facets. It proposes also to extend the set of traditional CL-SciSumm tasks with a new one, namely the discourse facet summarization task. The idea behind is to extract facet-specific descriptions of each RP consisting of a fixed-length collection of RP’s text spans. To tackle both the standard and the new tasks, we propose machine learning supported solutions based on the extraction of a selection of discriminating words, called pivot words. Predictive features based on pivot words are shown to be of great importance to rate the pertinence and relevance of a text span to a given facet. The newly proposed facet classification method performs significantly better than the best performing CL-SciSumm 2019 participant (i.e., the classification accuracy has increased by + 8%), whereas regression methods achieved promising results for the newly proposed summarization task.

  • Open Access English
    Authors: 
    Camil Demetrescu; Andrea Ribichini; Marco Schaerf;
    Country: Italy
    Project: EC | SecondHands (643950)

    We investigate the accuracy of how author names are reported in bibliographic records excerpted from four prominent sources: WoS, Scopus, PubMed, and CrossRef. We take as a case study 44,549 publications stored in the internal database of Sapienza University of Rome, one of the largest universities in Europe. While our results indicate generally good accuracy for all bibliographic data sources considered, we highlight a number of issues that undermine the accuracy for certain classes of author names, including compound names and names with diacritics, which are common features to Italian and other Western languages.