This dataset contains tabular files with acoustic measurements for prevocalic and non-prevocalic laterals produced by n = 62 German learners of English and n = 26 native speakers of English (BrE and AmE). The German subjects are instructional-setting learners ranging from grade 5 (age: 11) to university. They are predominantly from northern Bavaria and represent a broad range of proficiency levels. Pronunciation ability was assessed with a foreign accent rating. The data were elicited with a word list and each subject produced n = 15 tokens (n = 10 in non-prevocalic position, n = 5 in prevocalic position). Measurements are reported for the first two formants (F1 and F2). See ReadMe file for more details. Related publication: Soenning, Lukas. 2020. Phonological variation in German Learner English. University of Bamberg dissertation. DOI: 10.20378/irb-49135 Open access: https://fis.uni-bamberg.de/handle/uniba/49135
Publication abstract: A foundational goal of cognitive linguistics is to explain linguistic phenomena in terms of general cognitive strategies rather than postulating an autonomous language module (Langacker 1987: 12-13). Metonymy is identified among the imaginative capacities of cognition (Langacker 2009: 46-47). Whereas the majority of scholarship on metonymy has focused on lexical metonymy, this study explores the systematic presence of metonymy in word-formation. I argue that in many cases, the semantic relationships between stems, affixes, and the words they form can be analyzed in terms of metonymy, and that this analysis yields a better, more insightful classification than traditional descriptions of word-formation. I present a metonymic classification of suffixal word-formation in three languages: Russian, Czech, and Norwegian. The system of classification is designed to maximize comparison between lexical and word-formational metonymy. This comparison supports another central claim of cognitive linguistics, namely that grammar (in this case word-formation) and lexicon form a continuum (Langacker 1987: 18-19), since I show that metonymic relationships in the two domains can be described in nearly identical terms. While many metonymic relationships are shared across the lexical and grammatical domains, some are specific to only one domain, and the two domains show different preferences for SOURCE and TARGET concepts. Furthermore, I find that the range of metonymic relationships expressed in word-formation is more diverse than what has been found in lexical metonymy. There is remarkable similarity in word-formational metonymy across the three languages, despite their typological differences: Russian and Czech present lexicons comprised almost entirely of word-formational families (Dokulil 1962: 14), whereas Norwegian is more he avily invested in compounding. Although this study is limited to three Indo-European languages, the goal is to create a classification system that could be implemented (perhaps with modifications) across a wider spectrum of languages. This study involves the collection of three databases representing the types of suffixal word-formation found in Russian, Czech and Norwegian and their metonymic interpretations, giving the vehicle (starting point) for the metonymy (also called the source in the published article), and the target of the metonymy, and a single example for each type. Other factors that were examined were also the number of metonymy designations (vehicle-target pairs) for each suffix, whether a given metonymy designation was represented also in lexical metonymy, whether a given metonymy designation could be reversed (i.e. both agent for action and action for agent).
These are the data for a journal article on 'Accusative of Negation in 'Borderland' Polish'. The abstract of the article is below. The data consist of the annotated list of tokens of accusative vs. genitive of negation (=GenNeg.txt), excerpted manually from relevant sources documenting south-eastern 'Borderland' Polish as used in the city of Lviv until WWII. Three types of sources have been used for this study: i.) the surviving and published scripts of a weekly popular radio programme of Polish Radio Lwów ('Wesola Lwowska Fala'), mainly pre-WWII, conducted in the dialect (1933-1945), for a few of which the accompanying recordings have been recovered too; ii.) a recovered pre-WWII film production with dialogues predominantly in the dialect (1939); iii.) written texts in the dialect from Lviv-based satirical magazines, predominantly pre-WWI (1882-1917). The sources and the annotation of the tokens are detailed in the accompanying description of the data (=00_readme_file_for_GenNeg.txt). The tokens were annotated for various factors, pertaining to the case-marked noun, to the verb and to the type of clause. The aim was to establish the correlation between these factors and the selection of dialectal accusative vs. Standard Polish genitive of negation. Here is the abstract of the article: The paper aims at offering a descriptive analysis of case under sentential negation in the pre-World War II urban dialect of Lviv, one of the key historical south-eastern ‘Borderland’ varieties of Polish which developed under strong Ukrainian influence. In this dialect, the direct internal argument in negated sentences could surface either in the genitive or accusative case. This is in contrast to other varieties of Polish, including Standard Polish, where it must be in the genitive. A distributional analysis of the data available suggests that the variation was not random. It was conditioned by the semantics of the object: The accusative was available if the noun phrase was definite. The genitive was not subject to any constraints. I argue that this represents a mixed grammar of case under negation: the Standard Polish model as well as a dialectal model. The latter emerged under the influence of Ukrainian. This mixed model is ultimately based on the availability of two types of negation phrase in Lviv ‘Borderland’ Polish, one without any scope features as in Standard Polish, and one with a negated quantificational scope feature as in East Slavonic.
North Saami is replacing the use of possessive suffixes on nouns with a morphologically simpler analytic construction. Our data (>2K examples culled from >.5M words) track this change through three generations and parameters of semantics, syntax, and geography. Intense contact pressure on this minority language probably promotes morphological simplification, yielding an advantage for the innovative construction. The innovative construction is additionally advantaged because it has a wider syntactic and semantic range and is indispensable, whereas its competitor can always be replaced. The one environment where the possessive suffix is most strongly retained even in the youngest generation is in the Nominative singular case, and here we find evidence that the possessive suffix is being reinterpreted as a vocative case marker. The files make it possible to see all of our data and to do the statistical analysis and plots in R.
We describe and compare two tools for processing Middle Russian texts. Both tools provide lemmatization, part-of-speech and morphological annotation. One (“RNC”) was developed for annotating texts in the Russian National Corpus and is rule-based. The other one (“TOROT”) is being used for annotating the eponymous corpus and is statistical. We apply the two analyzers to the same Middle Russian text and then compare their outputs with high-quality manual annotation. Since the analyzers use different annotation schemes and spelling principles, we have to harmonize their outputs before we can compare them. The comparison shows that TOROT performs considerably better than RNC (lemmatization 69.8% vs. 47.3%, part of speech 89.5% vs. 54.2%, morphology 81.5% vs. 16.7%). If, however, we limit the evaluation set only to those tokens for which the analyzers provide a guess and in addition consider the RNC response correct if one of the multiple guesses it provides is correct, the numbers become comparable (88.5% vs. 91.9%, 93.9% vs. 95.2%, 81.5% vs. 86.8%). We develop a simple procedure which boosts TOROT lemmatization accuracy by 8.7% by using RNC lemma guesses when TOROT fails to provide one and matching them against the existing TOROT lemma database. We conclude that a statistical analyzer (trained on a large material) can deal with non-standardised historical texts better than a rule-based one. Still, it is possible to make the analyzers collaborate, boosting the performance of the superior one.
This dataset contains different measures of plosives produced by 16 Norwegian learners of French as a third language during a reading task and a repetition task. The data are extracted from two corpora collected within the framework of the IPFC project (Interphonologie du français contemporain): the Tromsø corpus with high school students, and the Oslo corpus with university students enrolled in a first year course on French phonetics and phonology. The dataset contains four files: A readme file, the word list used during the reading and repetition tasks, a data file containing all measures, and a text file presenting average values and VOT ranges for the individual informants.
This data set contains recordings of five Icelandic speakers, collected for the MA project of the author on Icelandic pre-aspiration and vowel duration. In Hindi, vowels before aspirated consonants are longer than vowels followed by non-aspirated consonants. This research extended the enquiry to the effects of pre-aspiration on vowel duration.
Original data for a general-purpose multi-dimensional analysis model of register variation in Czech. This post contains a CSV data set of 137 linguistic features measured on 3428 Czech text chunks, and an R script which performs a factor analysis on this data set. The results of this factor analysis were used as a basis for an 8-dimensional model of register variation in Czech (see Related Publications), following the methodology introduced by Douglas Biber (see e.g. his 1988 seminal work Variation Across Speech and Writing for details on the methodology, or his 2014 article “Using multi-dimensional analysis to explore cross-linguistic universals of register variation” for a review of MDA results across a variety of languages). The data is derived from the Koditex corpus , which aims to be as diversified as possible, covering various forms of spoken and written (both print and on-line) Czech. In compiling this corpus, the purpose was to provide a solid empirical basis for a comprehensive general-purpose model of register variation in Czech. Apart from this data set and related publications, additional resources pertaining to the project are available via the czcorpus/mda GitHub repository.
This dataset contains tabular files with auditory classifications for onset /v/ produced by German learners of English. The data originate from two different studies. The data by Soenning (2020) include n = 62 speakers (instructional-setting learners from northern Bavaria; age: 11--30), who produced a total of n = 429 tokens. These data were elicited with two reading tasks, a word list and short question-answer sequences. The data by Rank (2016) include n = 26 speakers (mostly from southern Germany (Bavaria) who had learned English in an instructional context for at least 7 consecutive years; the sample may be considered as representing intermediate to advanced proficiency levels), who produced a total of n = 1144 tokens. These data were elicited with a word list and a reading passage. See ReadMe file for more details. Related publication: Soenning, Lukas. 2020. Phonological variation in German Learner English. University of Bamberg dissertation. DOI: 10.20378/irb-49135 Open access: https://fis.uni-bamberg.de/handle/uniba/49135
This is the data examined in the study of Modern Russian verbs formed with the prefixes VY- and IZ-, a native East Slavic prefix and a loan Church Slavonic prefix, both of which mean ‘out of’. The study provides a synchronic contrastive analysis of the two prefixes and discusses how much they are semantically similar and what determines their distribution across Russian verbs. The dataset “VY_IZ_DATABASE_2019” provides replication data for the article “Two origins of the prefix IZ- and how they affect the VY- vs. IZ- correlation in Modern Russian” accepted for publication in Russian Linguistics. International Journal for the Study of Russian and other Slavic Languages 43(3). The amount of data examined in this study exceeds all previous accounts of the issue. The database contains 989 prefixed verbs. The verbs were culled from the Modern Subcorpus of the Russian National Corpus (www.ruscorpora.ru) and manually tagged for a number of parameters. The data was extracted automatically via the software management program MySQL. After that each verb was double-checked in the corpus and analyzed. In the database, each verb is accompanied with an English gloss, simplex base, corpus frequency, corpus example of its use, and a number of tags relevant for this study (type of perfective, submeaning of the prefix, etc.). The structure of the database is described in detail in the document “ReadMe”. Here is the abstract of the article: This article reports on a synchronic study of 989 Modern Russian verbs formed with the prefixes VY- and IZ-, including standard lexemes, obsolete verbs, and newly-formed coinages culled from the Russian National Corpus. I argue that the hypothesis about the two historical origins of the prefix IZ- may explain the ambivalent behavior of this prefix in Modern Russian, which shows both semantic overlap and semantic contrast with the prefix VY-. I revisit the most detailed semantic account of the two prefixes (Nesset et al 2011) and provide additional support for their model of polysemy in terms of type and token frequencies of the analyzed verbs. I further propose that VY- and IZ- encode different spatial image schemas and thus explain why the prefix IZ- is compatible with verbs of multidirectional motion, whereas VY- preferably attaches to verbs of unidirectional motion; why the verbs prefixed in IZ- often carry a more evocative flavor and refer to more intensive activities than those described by parallel verbs in VY-; why IZ- encodes multiplication of an action named by the base and why this is not common for VY-; and finally how it is possible for IZ- to have both bookish and colloquial uses, being very obsolete and highly productive in different submeanings.