Publisher: Springer Science and Business Media LLC
Project: EC | WIDE (742545), EC | WIDE (742545)
AbstractScientific writings, as one essential part of human culture, have evolved over centuries into their current form. Knowing how scientific writings evolved is particularly helpful in understanding how trends in scientific culture developed. It also allows us to better understand how scientific culture was interwoven with human culture generally. The availability of massive digitized texts and the progress in computational technologies today provide us with a convenient and credible way to discern the evolutionary patterns in scientific writings by examining the diachronic linguistic changes. The linguistic changes in scientific writings reflect the genre shifts that took place with historical changes in science and scientific writings. This study investigates a general evolutionary linguistic pattern in scientific writings. It does so by merging two credible computational methods: relative entropy; word-embedding concreteness and imageability. It thus creates a novel quantitative methodology and applies this to the examination of diachronic changes in the Philosophical Transactions of Royal Society (PTRS, 1665–1869). The data from two computational approaches can be well mapped to support the argument that this journal followed the evolutionary trend of increasing professionalization and specialization. But it also shows that language use in this journal was greatly influenced by historical events and other socio-cultural factors. This study, as a “culturomic” approach, demonstrates that the linguistic evolutionary patterns in scientific discourse have been interrupted by external factors even though this scientific discourse would likely have cumulatively developed into a professional and specialized genre. The approaches proposed by this study can make a great contribution to full-text analysis in scientometrics.
Abstract Hyphenated compounds have largely been neglected in the studies of compounding, which have seldom analysed compounds in context. In this study, we argue that the hyphen use in compounds is strongly motivated. Hyphenation is used when words form a unit, which reduces the possibility of parsing them into separate units or other forms. The current study adopts a new perspective on contextual factors, namely, which part of speech (PoS) the compound as a whole belongs to and how people correctly parse a compound into a unit. This process can be observed and analysed by considering examples. This study therefore holds that hyphenation might have gradually become a compounding technique that differs from general compounding principles. To better understand hyphenated compounds and the motivation for using hyphenation, we conduct a quantitative investigation into their distribution frequency to explore how English hyphenated compounds have been used in over the last 200 years. Diachronic change in the frequency of the distribution for compounds has seldom been considered. This question is explored by using frequency data obtained from the three databases that contain hyphenated compounds. Diachronic analysis shows that the frequencies of tokens and types in hyphenated compounds have been increasing, and changes in both frequencies follow the S-curve model. Historical evidence shows that hyphenation in compounds, as an orthographic form, does not seem to disappear easily. Familiarity and economy, as suggested in the cognitive studies of compounding, cannot adequately explain this phenomenon. The three databases that we used provide cross-verification that suggests that hyphenation has evolved into a compounding technique. Language users probably unconsciously take advantage of the discriminative learning model to remind themselves that these combinations should be parsed differently. Thus the hyphenation compounding technique facilitates communication efficiency. Overall, this study significantly enhances our understanding of the nature of compounding, the motivations for using hyphenation, and its cognitive processing.
Pseudowords have long served as key tools in psycholinguistic investigations of the lexicon. A common assumption underlying the use of pseudowords is that they are devoid of meaning: Comparing words and pseudowords may then shed light on how meaningful linguistic elements are processed differently from meaningless sound strings. However, pseudowords may in fact carry meaning. On the basis of a computational model of lexical processing, linear discriminative learning (LDL Baayen et al., Complexity, 2019, 1–39, 2019), we compute numeric vectors representing the semantics of pseudowords. We demonstrate that quantitative measures gauging the semantic neighborhoods of pseudowords predict reaction times in the Massive Auditory Lexical Decision (MALD) database (Tucker et al., 2018). We also show that the model successfully predicts the acoustic durations of pseudowords. Importantly, model predictions hinge on the hypothesis that the mechanisms underlying speech production and comprehension interact. Thus, pseudowords emerge as an outstanding tool for gauging the resonance between production and comprehension. Many pseudowords in the MALD database contain inflectional suffixes. Unlike many contemporary models, LDL captures the semantic commonalities of forms sharing inflectional exponents without using the linguistic construct of morphemes. We discuss methodological and theoretical implications for models of lexical processing and morphological theory. The results of this study, complementing those on real words reported in Baayen et al., (Complexity, 2019, 1–39, 2019), thus provide further evidence for the usefulness of LDL both as a cognitive model of the mental lexicon, and as a tool for generating new quantitative measures that are predictive for human lexical processing.
Nonwords are often used to clarify how lexical processing takes place in the absence of semantics. This study shows that nonwords are not semantically vacuous. We used Linear Discriminative Learning (Baayen et al., 2019) to estimate the meanings of nonwords in the MALD database (Tucker et al., 2018) from the speech signal. We show that measures gauging nonword semantics significantly improve model fit for both acoustic durations and RTs. Although nonwords do not evoke meanings that afford conscious reflexion, they do make contact with the semantic space, and the angles and distances of nonwords with respect to actual words co-determine articulation and lexicality decisions.