Publisher: Springer Science and Business Media LLC
Project: EC | WIDE (742545), EC | WIDE (742545)
AbstractScientific writings, as one essential part of human culture, have evolved over centuries into their current form. Knowing how scientific writings evolved is particularly helpful in understanding how trends in scientific culture developed. It also allows us to better understand how scientific culture was interwoven with human culture generally. The availability of massive digitized texts and the progress in computational technologies today provide us with a convenient and credible way to discern the evolutionary patterns in scientific writings by examining the diachronic linguistic changes. The linguistic changes in scientific writings reflect the genre shifts that took place with historical changes in science and scientific writings. This study investigates a general evolutionary linguistic pattern in scientific writings. It does so by merging two credible computational methods: relative entropy; word-embedding concreteness and imageability. It thus creates a novel quantitative methodology and applies this to the examination of diachronic changes in the Philosophical Transactions of Royal Society (PTRS, 1665–1869). The data from two computational approaches can be well mapped to support the argument that this journal followed the evolutionary trend of increasing professionalization and specialization. But it also shows that language use in this journal was greatly influenced by historical events and other socio-cultural factors. This study, as a “culturomic” approach, demonstrates that the linguistic evolutionary patterns in scientific discourse have been interrupted by external factors even though this scientific discourse would likely have cumulatively developed into a professional and specialized genre. The approaches proposed by this study can make a great contribution to full-text analysis in scientometrics.
Creating links manually between large datasets becomes an extremely tedious task. Although the linked data production is growing massively, the interconnecting needs improvement. This paper presents our work regarding detecting and extending links between Wikidata and COURAGE entities with respect to cultural heritage data. The COURAGE project explored the methods for cultural opposition in the socialist era (cc. 1950–1990), highlighting the variety of alternative cultural scenes that flourished in Eastern Europe before 1989. We describe our methods and results in discovering common entities in the two datasets, and our solution for automating this task. Furthermore, it is shown how it was possible to enrich the data in Wikidata and to establish new, bi-directional connections between COURAGE and Wikidata. Hence, the audience of both databases will have a more complete view of the matched entities.
We investigate the accuracy of how author names are reported in bibliographic records excerpted from four prominent sources: WoS, Scopus, PubMed, and CrossRef. We take as a case study 44,549 publications stored in the internal database of Sapienza University of Rome, one of the largest universities in Europe. While our results indicate generally good accuracy for all bibliographic data sources considered, we highlight a number of issues that undermine the accuracy for certain classes of author names, including compound names and names with diacritics, which are common features to Italian and other Western languages.