This dataset provides replication data for an article on differential object marking in early Slavonic. The article uses extensive treebank data from the PROIEL and TOROT treebanks to track the much-debated rise of the animacy category in Russian, which in this article will be analysed as a change from at least partly definiteness-driven differential object marking in Old Church Slavonic via constructionally conditioned variation in Old East Slavonic to fully fledged animacy subgender marking in late Middle Russian. The change is interesting from a methodological point of view as well, since it requires us to annotate data through an ongoing change, and also since conventional treebank annotation is not enough to capture the conditions of the observed variation and change: annotation for semantics and information structure is necessary too. The article describes and defends a conservative approach to annotation in the face of change: the analysis that fits the first attested stage of a change is retained as long as possible.
Dataset abstract The dataset includes an annotated corpus sample of N = 2000 French sentences with se mettre à or commencer à (1000 observations of each verb). The sample was drawn from the literary corpus Frantext (FT) and the journalistic corpus Le Monde (1000 observations from both corpora). The sample is balanced for verb as well as corpus, so we have 500 observations for each Verb-Corpus combination. The data is annotated for 8 variables: Source (corpus), Verb, Mood & Tense, Event type, Adverb presence, Adverb token, and Adverb type. Article abstract This article compares the usage of commencer à ‘to begin’+Vinf. and se mettre à ‘to start’ + Vinf. in modern French. Using a corpus sample of 2000 observations, we examined the effect of Adverbial complementation, Event type (aspect), Tense. Based on a mixed-effects logistic regression analysis, we found evidence for Event type – se mettre à is associated with activities – and Tense – se mettre à seems to be associated with Passé Simple, Futur proche and Subjonctif présent, whereas commencer à with Plus-que-parfait and Indicatif Imparfait. We discuss the results in the frame-semantic model of Croft (2012). We make the case that commencer à can have the profile of an achievement or that of an accomplishment while se mettre à manifests only one profile, i.e. that of an achievement. Our results support a one-component approach to aspect in which the result of the interaction between grammatical aspect and lexical aspect can be attributed to the same aspectual contour. References Croft, William (2012). Verbs : aspect and causal structure. Oxford: Oxford University Press.
[Article abstract] This multifactorial study reviews the determinants of particle alternation after uninflected try in varieties where English is native. The effects of a number of previously discussed and novel predictors are probed in data from well-known corpora. The results confirm the inclinations of North American varieties (try to) in contrast with those of the Australasian, British and Irish varieties (try and in speech but try to in writing). The previously reported general effects of the tense of try, mode and horror aequi are also corroborated. As regards the effect of register, the study contributes the finding that following Latin-based infinitives favor try to in most varieties, especially in writing. The paper discusses the status of the substantiated effects with respect to the notions of conventionalization and entrenchment: crucially, the higher degree of conventionalization of try to in North American varieties (a) makes the use of this variant less conditional on the sequential need to license euphony and (b) neutralizes the general contextual/register distinction for the alternation. From a usage-based viewpoint, the findings suggest that the higher frequency of a multiword sequence in a specific variety, and the higher degree of activation in the language users’ minds, can make it less contingent on general probabilistic constraints. [Dataset abstract] This is the data and code from a multifactorial study reviewing the determinants of particle alternation after uninflected try in native varieties of English. The effects of a number of previously discussed and novel predictors (see Section 3.1 of the paper) are probed in data from well-known corpora (ICE, GloWbE, BNC and COCA). The paper is published in English Language and Linguistics (https://www.doi.org/10.1017/S1360674321000393). I used R (R Core Team 2021) for all data analyses, hence the code can best be replicated in R.
The dataset includes examples of usages of groza and ugroza from the Russian National Corpus (RNC). The dataset covers the period from 1700 to 2020 and consists of 4858 examples, where 2335 are examples of groza and 2523 of ugroza.
Dataset abstract This dataset contains the results from 100 native (L1) Dutch speakers from Flanders (Belgium). These participants completed a (i) lexical decision task and a (ii) phoneme categorisation task. In the lexical decision task, participants were exposed to the accented speech of one Italian L1 speaker of Dutch who pronounced 40 target words with either canonical productions of the /ɪ/-vowel but ambiguous realisations of the /i/-vowel (e.g., vlinder 'butterfly' as [ˈvlɪn.dər], but diefstal 'theft' as [ˈdi/ɪf.stɑl]), or vice versa. Participants’ comprehension of the target words was measured in terms of word endorsement (i.e. accepting or rejecting target words as real Dutch words) and response time (i.e. the time interval between the end of stimulus presentation and participant response). The phoneme categorisation task, then, was used to verify if the Dutch L1 listeners are able to identify the two phonemes correctly and if they perceive the ambiguous sounds as pronunciation variants of one of the front vowels. If so, the participants are expected to identify the ambiguous vowels in minimal /ɪ/-/i/ words (e.g., bid-bied 'pray'-'bid') predominantly as either /ɪ/ or /i/, depending on whether the /ɪ/- or /i/-words contained ambiguous vowels in the lexical decision task. Article Abstract Listeners can usually effortlessly cope with the extreme acoustic variability of spoken language. Although accented speech might initially pose a challenge, listeners have been shown to rapidly adjust their perceptual system in response to atypical sound productions in the auditory input by exploiting prior lexical knowledge (i.e., lexically-guided perceptual learning). Here, we aimed to gain further insight into how Dutch L1 listeners adapt to Italian accented Dutch front vowels, and how short-term experience with one L2 speaker’s accent might help these listeners to interpret novel words and another L2 speaker’s accent. Therefore, 100 Dutch-speaking Belgian participants were exposed to 40 Dutch target words with either /ɪ/ or /i/ as syllable nucleus. All stimuli were produced by a female native speaker of Italian who is highly proficient in Dutch, but has a noticeable Italian accent. There were two exposure conditions: participants either heard target words in which the /ɪ/-sound was replaced by an ambiguous sound in between [ɪ]-[i] and canonically produced /i/-words (/ɪ/-ambiguous condition), or the exact opposite pattern (/i/-ambiguous condition). To assess perceptual learning, participants needed to identify the front vowel in five Dutch /ɪ/-/i/ minimal pairs across two speaker conditions: listeners either heard stimuli produced by the same female speaker or stimuli produced by a male-sounding speaker, whose voice was created from the female speaker’s voice using the ‘change gender’ function in Praat. Neither for the female speaker nor for the male-sounding speaker did we observe auditory perceptual learning effects. That is, participants did not identify the ambiguous vowel in the minimal pairs significantly differently depending on the exposure condition to which they had been assigned. Suggestions for future research are proposed on how to obtain a better understanding of how native speakers process L2 accented speech.
This is the data from the study that applies Keymorph Analysis of grammatical cases of nouns used in the Russian president V. Putin's speeches. The dataset includes: 1) metadata of the texts – twenty-nine transcripts of Putin's direct speech, produced between February 10, 2022 and March 2, 2022, which are the raw data in our study; 2) the sentences with the nouns meaning 'Russia', 'Ukraine', and 'NATO', extracted from the texts and tagged according to the grammatical cases of these nouns as well as the semantic meanings of the cases; 3) the calculated difference index (DIN*) values for the grammatical cases of the nouns meaning 'Russia', 'Ukraine', and 'NATO'. The DIN* was used as the effect size metric. The R code for creation of the bar chart with DIN* values for the grammatical cases of the nouns meaning 'Russia', 'Ukraine', and 'NATO' is also provided.
This dataset contains tabular files with durational measurements of vowels in connected speech. The data include measurements for 62 German learners of English and 25 native speakers of English (BrE and AmE). In total, 105 vocalic intervals were measured for each speaker, totaling 6509 measurements for learners (1 missing) and 2618 measurements for native speakers (7 missing). The data were elicited with a reading task, where participants read out short dialogues. The German subjects are instructional-setting learners ranging from grade 5 (age: 11) to university. They are predominantly from northern Bavaria and represent a broad range of proficiency levels. Pronunciation ability was assessed with a foreign accent rating.
This dataset compiles selected sentences from the MCVF and ARTFL-FRANTEXT corpora containing lexical items that evoke the Reveal Secret frame (as described in the ASFALDA French FrameNet) from the 13th-20th centuries. The data are semantically annotated, and are used in a research project on changes in the use of the metonymic argument alternations MEDIUM FOR SPEAKER and TOPIC FOR INFORMATION.
The dataset supports the research article "Salience-simplification strategy to markedness of causal subordinators: The case of “because” and “since” in argumentative essays". In total, the dataset marks features of 976 causal adverbial subordinations retrieved from student argumentative essays.Data points were extracted from three corpora. Specifically, all essays in NESSIE (Native English Speakers’ Similarly or Identically-prompted Essays, created by Xu Jiajin, 781 essays; 291,911 tokens) and argumentative essays in LOCNESS (the Louvain Corpus of Native English Essays, created by Granger, 323 essays; 230,138 tokens) were selected. Native argumentative essays from BAWE’s (British Academic Written English, created by Hilary Nesi) Arts and Humanities disciplinary group were chosen (512 essays; 1,360,932 tokens). In total, 1,616 essays comprising 1,882,981 tokens were examined. The dataset comprises 976 datapoints of causal subordinations conjoined by "because" and "since" in students' argumentative essays--488 data points of all "since" subordinations, and 488 randomly selected "because" subordinations. On these data points, ten contextual features that are potential predictors of people's choices between causal subordinators "because" and "since" were annotated. The ten contextual features annotated are "position", "separation", "embeddedness", "initial adverbials", "sub-clause", "de-ranking", "clause-length ratio", "hedging terms", "clausal relationship", and "bridging". Overall fourteen variables including ten contetual features are annotated: (1) "No." is the ID of each data point(this is one ID marker); (2) "subordinator" marks the logical subordinators (this categorical variable has two values: "because" and "since"); (3) "position" marks the logical adverbial clause positions compared with the main clause (this categorical variable has two values: "preposed" or "postposed"); (4) "sep" indicates whether a separating punctuation mark exists between the subordinate and main clauses(this categorical variable has two values: "YES" or "NO"); (5) "embeddedness" indicates whether a complex sentence is embedded in a larger comlex sentence(this categorical variable has two values: "YES" or "NO"); (6) "ini.adv" denotes whether an initial adverbial exists in the causal subordination(this categorical variable has two values: "YES" or "NO"); (7) "sub-clau" indicates whether the causal subordinate contains sub-clauses of any type(this categorical variable has two values: "YES" or "NO"); (8) "deranking" indicates whether the predicate of the subordinate clause is complete(this categorical variable has two values: "YES" or "NO"); (9) "sub.main.ratio" is the length ratio of the subordinate and main clauses in terms of word count (this numerical variable is converted into ln value for better interpretation); (10) "hedging" indicates whether a hedging term exists in the subordinate clause(this categorical variable has two values: "YES" or "NO"); (11) "clau.rel" denotes the interclausal relationships on the general level(this categorical variable has two values: "direct" or "indirect"); (12) "spc.clau.rel2" denotes the interclausal relationships on the secondary level(this categorical variable has five values: "im", "rm", "asst", "inpr", and "sugg"); (13) "bridging" indicates whether the subordinate clause contains any information referring back to the preceding clause(this categorical variable has two values: "YES" or "NO"); (14) "source" shows specific corpora the data points come from (this categorical variable has three values: "NESSIE", "LOCNESS", or "BAWE") ; This dataset was constructed to explore contextual features that discriminate between causal subordinators of "because" and "since" and to rank the effective features.
This dataset concerns the data for the article that covers the topic of future tense meanings in Russian. Abstract: The relationship between future time and future tense forms in Russian is complex. The forms traditionally attributed to the future tense in certain cases do not refer to future time. Those cases have been previously presented as a list and/or attributed to the sphere of modality. In this article, we suggest a data-driven approach applied to the spectrum of meanings of Russian future tense forms. We analyzed corpus data and discovered that 44% of perfective future forms and 22% of imperfective future forms do not unambiguously express future time meaning. Among the non-future time meanings that Russian future tense forms can express are Gnomic, Performative, Implicative, Hypothetical, Alternation, and Stable scenario. Furthermore, we propose that the meanings of the future tense constitute a radial category. Future time reference is the prototypical meaning of the future tense. The remaining meanings comprise extensions connected to the prototypical meaning. We describe the radial category with reference to Langacker’s (2008) model of tense and potentiality. Additionally, we explore the interaction of future tense and modality.