research data . Dataset . 2015 . Embargo end date: 16 May 2015

Europarl QTLeap WSD/NED corpus

Agirre, Eneko; Branco, António; Popel, Martin; Simov, Kiril;
Open Access
  • Published: 01 Jan 2015
  • Publisher: University of the Basque Country, UPV/EHU
Abstract
This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-610516 (http://qtleap.eu). The texts are sentences from the Europarl parallel corpus (Koehn, 2005). We selected the monolingual sentences from parallel corpora for the following pairs: Bulgarian-English, Czech-English, Portuguese-English and Spanish-English. The English corpus is comprised by the English side of the Spanish-English corpus. Basque is not in Europarl. In addition, it contains the Basque and English sides of the GNOME corpus. The texts have been automatically annotated with NLP tools, including Word Sense Disambiguation, Named Entity Disambiguation an...
Persistent Identifiers
Funded by
EC| QTLEAP
Project
QTLEAP
Quality Translation by Deep Language Engineering Approaches
  • Funder: European Commission (EC)
  • Project Code: 610516
  • Funding stream: FP7 | SP1 | ICT
Communities
CLARIN
Digital Humanities and Cultural Heritage
Download from
Any information missing or wrong?Report an Issue