Michael Haslam; R. Adriana Hernandez-Aguilar; Tomos Proffitt; Adrián Arroyo; Tiago Falótico; Dorothy M. Fragaszy; Michael D. Gumert; John W.K. Harris; Michael A. Huffman; Ammie K. Kalan; +12 more
Michael Haslam; R. Adriana Hernandez-Aguilar; Tomos Proffitt; Adrián Arroyo; Tiago Falótico; Dorothy M. Fragaszy; Michael D. Gumert; John W.K. Harris; Michael A. Huffman; Ammie K. Kalan; Suchinda Malaivijitnond; Tetsuro Matsuzawa; William C. McGrew; Eduardo B. Ottoni; Alejandra Pascual-Garrido; Alex K. Piel; Jill D. Pruetz; Caroline Schuppli; Fiona A. Stewart; Amanda Tan; Elisabetta Visalberghi; Lydia V. Luncz;
Since its inception, archaeology has traditionally focused exclusively on humans and our direct ancestors. However, recent years have seen archaeological techniques applied to material evidence left behind by non-human animals. Here, we review advances made by the most prominent field investigating past non-human tool use: primate archaeology. This field combines survey of wild primate activity areas with ethological observations, excavations and analyses that allow the reconstruction of past primate behaviour. Because the order Primates includes humans, new insights into the behavioural evolution of apes and monkeys also can be used to better interrogate the record of early tool use in our own, hominin, lineage. This work has recently doubled the set of primate lineages with an excavated archaeological record, adding Old World macaques and New World capuchin monkeys to chimpanzees and humans, and it has shown that tool selection and transport, and discrete site formation, are universal among wild stone-tool-using primates. It has also revealed that wild capuchins regularly break stone tools in a way that can make them difficult to distinguish from simple early hominin tools. Ultimately, this research opens up opportunities for the development of a broader animal archaeology, marking the end of archaeology's anthropocentric era.
Publisher: Association for Computational Linguistics
Project: EC | ELG (825627)
Textbook Question Answering is a complex task in the intersection of Machine Comprehension and Visual Question Answering that requires reasoning with multimodal information from text and diagrams. For the first time, this paper taps on the potential of transformer language models and bottom-up and top-down attention to tackle the language and visual understanding challenges this task entails. Rather than training a language-visual transformer from scratch we rely on pre-trained transformers, fine-tuning and ensembling. We add bottom-up and top-down attention to identify regions of interest corresponding to diagram constituents and their relationships, improving the selection of relevant visual information for each question and answer options. Our system ISAAQ reports unprecedented success in all TQA question types, with accuracies of 81.36%, 71.11% and 55.12% on true/false, text-only and diagram multiple choice questions. ISAAQ also demonstrates its broad applicability, obtaining state-of-the-art results in other demanding datasets. Comment: Accepted for publication as a long paper in EMNLP2020
AbstractIt is well known that real-time human language processing is highly incremental and context-driven, and that the strength of a comprehender’s expectation for each word encountered is a key determinant of the difficulty of integrating that word into the preceding context. In reading, this differential difficulty is largely manifested in the amount of time taken to read each word. While numerous studies over the past thirty years have shown expectation-based effects on reading times driven by lexical, syntactic, semantic, pragmatic, and other information sources, there has been little progress in establishing the quantitative relationship between expectation (or prediction) and reading times. Here, by combining a state-of-the-art computational language model, two large behavioral data-sets, and non-parametric statistical techniques, we establish for the first time the quantitative form of this relationship, finding that it is logarithmic over six orders of magnitude in estimated predictability. This result is problematic for a number of established models of eye movement control in reading, but lends partial support to an optimal perceptual discrimination account of word recognition. We also present a novel model in which language processing is highly incremental well below the level of the individual word, and show that it predicts both the shape and time-course of this effect. At a more general level, this result provides challenges for both anticipatory processing and semantic integration accounts of lexical predictability effects. And finally, this result provides evidence that comprehenders are highly sensitive to relative differences in predictability – even for differences between highly unpredictable words – and thus helps bring theoretical unity to our understanding of the role of prediction at multiple levels of linguistic structure in real-time language comprehension.
Sound correspondence patterns play a crucial role for linguistic reconstruction. Linguists use them to prove language relationship, to reconstruct proto-forms, and for classical phylogenetic reconstruction based on shared innovations. Cognate words which fail to conform with expected patterns can further point to various kinds of exceptions in sound change, such as analogy or assimilation of frequent words. Here we present an automatic method for the inference of sound correspondence patterns across multiple languages based on a network approach. The core idea is to represent all columns in aligned cognate sets as nodes in a network with edges representing the degree of compatibility between the nodes. The task of inferring all compatible correspondence sets can then be handled as the well-known minimum clique cover problem in graph theory, which essentially seeks to split the graph into the smallest number of cliques in which each node is represented by exactly one clique. The resulting partitions represent all correspondence patterns which can be inferred for a given dataset. By excluding those patterns which occur in only a few cognate sets, the core of regularly recurring sound correspondences can be inferred. Based on this idea, the paper presents a method for automatic correspondence pattern recognition, which is implemented as part of a Python library which supplements the paper. To illustrate the usefulness of the method, we present how the inferred patterns can be used to predict words that have not been observed before.
In visual word identification, readers automatically access word internal information: they recognize orthographically embedded words (e.g., HAT in THAT) and are sensitive to morphological structure (DEAL-ER, BASKET-BALL). The exact mechanisms that govern these processes, however, are not well established yet - how is this information used? What is the role of affixes in this process? To address these questions, we tested the activation of meaning of embedded word stems in the presence or absence of a morphological structure using two semantic categorization tasks in Italian. Participants made category decisions on words (e.g., is CARROT a type of food?). Some no-answers (is CORNER a type of food?) contained category-congruent embedded word stems (i.e., CORN-). Moreover, the embedded stems could be accompanied by a pseudo-suffix (-er in CORNER) or a non-morphological ending (-ce in PEACE) - this allowed gauging the role of pseudo-suffixes in stem activation. The analyses of accuracy and response times revealed that words were harder to reject as members of a category when they contained an embedded word stem that was indeed category-congruent. Critically, this was the case regardless of the presence or absence of a pseudo-suffix. These findings provide evidence that the lexical identification system activates the meaning of embedded word stems when the task requires semantic information. This study brings together research on orthographic neighbors and morphological processing, yielding results that have important implications for models of visual word processing.
Abstract Four language production experiments examine how English speakers plan compound words during phonological encoding. The experiments tested production latencies in both delayed and online tasks for English noun-noun compounds (e.g., daytime), adjective-noun phrases (e.g., dark time), and monomorphemic words (e.g., denim). In delayed production, speech onset latencies reflect the total number of prosodic units in the target sentence. In online production, speech latencies reflect the size of the first prosodic unit. Compounds are metrically similar to adjective-noun phrases as they contain two lexical and two prosodic words. However, in Experiments 1 and 2, native English speakers treated the compounds as single prosodic units, indistinguishable from simple words, with RT data statistically different than that of the adjective-noun phrases. Experiments 3 and 4 demonstrate that compounds are also treated as single prosodic units in utterances containing clitics (e.g., dishcloths are clean) as they incorporate the verb into a single phonological word (i.e. dishcloths-are). Taken together, these results suggest that English compounds are planned as single recursive prosodic units. Our data require an adaptation of the classic model of phonological encoding to incorporate a distinction between lexical and postlexical prosodic processes, such that lexical boundaries have consequences for post-lexical phonological encoding.
Published: 13 May 2016 The present study investigated the proactive nature of the human brain in language perception. Specifically, we examined whether early proficient bilinguals can use interlocutor identity as a cue for language prediction, using an event-related potentials (ERP) paradigm. Participants were first familiarized, through video segments, with six novel interlocutors who were either monolingual or bilingual. Then, the participants completed an audio-visual lexical decision task in which all the interlocutors uttered words and pseudo-words. Critically, the speech onset started about 350 ms after the beginning of the video. ERP waves between the onset of the visual presentation of the interlocutors and the onset of their speech significantly differed for trials where the language was not predictable (bilingual interlocutors) and trials where the language was predictable (monolingual interlocutors), revealing that visual interlocutor identity can in fact function as a cue for language prediction, even before the onset of the auditory-linguistic signal. This research was funded by the Severo Ochoa program grant SEV-2015-0490, a grant from the Spanish Ministry of Science and Innovation (PSI2012-31448), from FP7/2007-2013 Cooperation grant agreement 613465-AThEME and an ERC grant from the European Research Council (ERC-2011-ADG-295362) to M.C. We thank Antonio Ibañez for his work in stimulus preparation.
Automated Essay Scoring has gained a wider applicability and usage with the integration of advanced Natural Language Processing techniques which enabled in-depth analyses of discourse in order capture the specificities of written texts. In this paper, we introduce a novel Automatic Essay Scoring method for Dutch language, built within the Readerbench framework, which encompasses a wide range of textual complexity indices, as well as an automated segmentation approach. Our method was evaluated on a corpus of 173 technical reports automatically split into sections and subsections, thus forming a hierarchical structure on which textual complexity indices were subsequently applied. The stepwise regression model explained 30.5% of the variance in students’ scores, while a Discriminant Function Analysis predicted with substantial accuracy (75.1%) whether they are high or low performance students.
Before the end of the first year of life, infants begin to lose the ability to perceive distinctions between sounds that are not phonemic in their native language. It is typically assumed that this developmental change reflects the construction of language-specific phoneme categories, but how these categories are learned largely remains a mystery. Peperkamp, Le Calvez, Nadal, and Dupoux (2006) present an algorithm that can discover phonemes using the distributions of allophones as well as the phonetic properties of the allophones and their contexts. We show that a third type of information source, the occurrence of pairs of minimally differing word forms in speech heard by the infant, is also useful for learning phonemic categories and is in fact more reliable than purely distributional information in data containing a large number of allophones. In our model, learners build an approximation of the lexicon consisting of the high-frequency n-grams present in their speech input, allowing them to take advantage of top-down lexical information without needing to learn words. This may explain how infants have already begun to exhibit sensitivity to phonemic categories before they have a large receptive lexicon.
With laughter research seeing a development in recent years, there is also an increased need in materials having laughter annotations. We examine in this study how one can leverage existing spontaneous speech resources to this goal. We first analyze the process of manual laughter annotation in corpora, by establishing two important parameters of the process: the amount of time required and its inter-rater reliability. Next, we propose a novel semi-automatic tool for laughter annotation, based on a signal-based representation of speech rhythm. We test both annotation approaches on the same recordings, containing German dyadic spontaneous interactions, and employing a larger pool of annotators than previously done. We then compare and discuss the obtained results based on the two aforementioned parameters, highlighting the benefits and costs associated to each approach.