Publisher: Association for Computational Linguistics
Project: EC | SUMMA (688139), EC | SUMMA (688139)
Two extensions to the AMR smatch scoring script are presented. The first extension com-bines the smatch scoring script with the C6.0 rule-based classifier to produce a human-readable report on the error patterns frequency observed in the scored AMR graphs. This first extension results in 4% gain over the state-of-art CAMR baseline parser by adding to it a manually crafted wrapper fixing the identified CAMR parser errors. The second extension combines a per-sentence smatch with an en-semble method for selecting the best AMR graph among the set of AMR graphs for the same sentence. This second modification au-tomatically yields further 0.4% gain when ap-plied to outputs of two nondeterministic AMR parsers: a CAMR+wrapper parser and a novel character-level neural translation AMR parser. For AMR parsing task the character-level neural translation attains surprising 7% gain over the carefully optimized word-level neural translation. Overall, we achieve smatch F1=62% on the SemEval-2016 official scor-ing set and F1=67% on the LDC2015E86 test set. NAACL HLT 2016, SemEval-2016 Task 8 submission
RESUMEN: Desde finales de la Baja Edad Media y a lo largo de Época Moderna, algunas de las cofradías de pescadores establecidas en el corregimiento de las Cuatro Villas de la Costa consiguieron que la Monarquía les reconociera el privilegio de disfrutar de una jurisdicción marítima en cada corporación. El establecimiento de estas jurisdicciones disgustó a otras instituciones que vieron disminuidas sus competencias jurisdiccionales. Y de esta situación surgieron distintos conflictos en los que las hermandades tuvieron que luchar por la conservación de la jurisdicción marítima. ABSTRACT: Since the end of the Late Middle Ages and throughout the Modern Era, some of the fishermen's associations established in the corregimiento of the Four Villas of the Coast managed to get the Monarchy to recognize the privilege of enjoying a maritime jurisdiction in each brotherhood. The establishment of these jurisdictions disgusted other institutions that saw their jurisdiction diminished. From this situation arose different conflicts in which the brotherhoods had to fight for the preservation of the maritime jurisdiction. Este trabajo se ha realizado en el marco del Proyecto de Investigación Culturas urbanas en la España Moderna: policía, gobernanza e imaginarios (siglos XVI-XIX) con referencia HAR2015-64014-C3-1-R, financiado por el Ministerio de Economía y Competitividad) y del europeo (Rebellion and Resistance in the Iberian Empires, 16th-19th Centuries que ha recibido financiación del programa de investigación e innovación Horizonte 2020 de la Unión Europea en virtud del acuerdo de subvención Marie Skłodowska-Curie No 778076.
In this paper, we investigate different approaches for dialect identification in Arabic broadcast speech. These methods are based on phonetic and lexical features obtained from a speech recognition system, and bottleneck features using the i-vector framework. We studied both generative and discriminative classifiers, and we combined these features using a multi-class Support Vector Machine (SVM). We validated our results on an Arabic/English language identification task, with an accuracy of 100%. We also evaluated these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%. We further reported results using the proposed methods to discriminate between the five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine, North African, and MSA, with an accuracy of 59.2%. We discuss dialect identification errors in the context of dialect code-switching between Dialectal Arabic and MSA, and compare the error pattern between manually labeled data, and the output from our classifier. All the data used on our experiments have been released to the public as a language identification corpus.
The automatic extraction of a patient’s natural history from Electronic Health Records (EHRs) is a critical step towards building intelligent systems that can reason about clinical variables and support decision making. Although EHRs contain a large amount of valuable information about the patient’s medical care, this information can only be fully understood when analyzed in a temporal context. Any intelligent system should then be able to extract medical concepts, date expressions, temporal relations and the temporal ordering of medical events from the free texts of EHRs; yet, this task is hard to tackle, due to the domain specific nature of EHRs, writing quality and lack of structure of these texts, and more generally the presence of redundant information. In this paper, we introduce a new Natural Language Processing (NLP) framework, capable of extracting the aforementioned elements from EHRs written in Spanish using rule-based methods. We focus on building medical timelines, which include disease diagnosis and its progression over time. By using a large dataset of EHRs comprising information about patients suffering from lung cancer, we show that our framework has an adequate level of performance by correctly building the timeline for 843 patients from a pool of 989 patients, achieving a correct result in 85% of instances.
Publisher: Association for Computational Linguistics
Country: United Kingdom
Project: EC | SUMMA (688139), EC | TraMOOC (644333)
This paper describes the AMU-UEDIN submissions to the WMT 2016 shared task on news translation. We explore methods of decode-time integration ofattention-based neural translation models with phrase-based statistical machinetranslation. Efficient batch-algorithms for GPU-querying are proposed and implemented. For English-Russian, our system stays behind the state-of-the-art pure neural models in terms of BLEU. Among restricted systems, manual evaluation places it in the first cluster tied with the pure neural model. For the Russian-English task, our submission achieves the top BLEU result, outperforming the best pure neural system by 1.1 BLEU points and our ownphrase-based baseline by 1.6 BLEU. After manual evaluation, this system is thebest restricted system in its own cluster. In follow-up experiments we improve results by additional 0.8 BLEU.
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. We identify the off-target translation issue (i.e. translating into a wrong target language) as the major source of the inferior zero-shot performance, and propose random online backtranslation to enforce the translation of unseen training language pairs. Experiments on OPUS-100 (a novel multilingual dataset with 100 languages) show that our approach substantially narrows the performance gap with bilingual models in both one-to-many and many-to-many settings, and improves zero-shot performance by ~10 BLEU, approaching conventional pivot-based methods. Comment: ACL2020
In this interview, James Green, a prominent Brazilianist, tells us about his interest in Brazilian history, his life as a civic and political activist against authoritarianism in Brazil and for gay and lesbian rights, and his academic work and career. The purpose of the interview, besides bringing his work to a wider audience of European historians and social scientists, is to reflect on the relationship between academic work and political and ideological activism, and to discuss the problems of subjectivism and the use of individual testimonies in the making of contemporary history. We invited James Green to reflect on those matters, so he could share with us the views of someone who, because of the nature of his work, could not help but deal permanently with such questions. Nesta entrevista, James Green, um importante “brasilianista”, fala-nos sobre o seu interesse pelo história do Brasil, sobre a sua vida como militante cívico e político contra o autoritarismo no Brasil e a favor dos direitos de gays e lésbicas, e ainda sobre a sua carreira e o seu trabalho académico. O objetivo da entrevista, além de levar o seu trabalho a um público mais amplo de historiadores e cientistas sociais europeus, é refletir sobre a relação entre o trabalho académico e o ativismo político e ideológico, e discutir os problemas do subjetivismo e do uso de testemunhos individuais na construção da história contemporânea. Convidámos James Green a refletir sobre esses problemas, para que pudesse compartilhar connosco as opiniões de alguém que, devido à natureza do seu trabalho, não pôde deixar de se confrontar permanentemente com tais questões. Dans cet entretien, James Green, un important spécialiste de l’histoire moderne du Brésil, nous parle de son intérêt pour le Brésil, de sa vie de militant civique et politique contre l’autoritarisme au Brésil et pour les droits des gays et lesbiennes, ainsi que de sa carrière et de son travail universitaire. L’entretien a pour but de présenter son travail à un public plus large d’historiens et de spécialistes des sciences sociales européens, mais aussi de réfléchir sur le rapport entre travail universitaire et activisme politique et idéologique, et de discuter les problèmes du subjectivisme et de l’usage de témoignages individuels dans la construction de l’histoire contemporaine. Nous avons invité James Green à réfléchir sur ces questions pourqu’il puisse partager avec nous le point de vue de quelqu’un qui, en raison de la nature de son travail, ne pourrait s’empêcher de faire toujours face à ces questions.