research data . Dataset . 2012

Prague Czech-English Dependency Treebank 2.0

Hajič, Jan; Hajičová, Eva; Panevová, Jarmila; Sgall, Petr; Cinková, Silvie; Fučíková, Eva; Mikulová, Marie; Pajas, Petr; Popelka, Jan; Semecký, Jiří; ...
Open Access
  • Published: 15 Jun 2012
  • Publisher: Linguistic Data Consortium
Abstract
Texts The Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) is a major update of the Prague Czech-English Dependency Treebank 1.0 (LDC2004T25). It is a manually parsed Czech-English parallel corpus sized over 1.2 million running words in almost 50,000 sentences for each part. Data The English part contains the entire Penn Treebank - Wall Street Journal Section (LDC99T42). The Czech part consists of Czech translations of all of the Penn Treebank-WSJ texts. The corpus is 1:1 sentence-aligned. An additional automatic alignment on the node level (different for each annotation layer) is part of this release, too. The original Penn Treebank-like file structure ...
Funded by
EC| EUROMATRIXPLUS
Project
EUROMATRIXPLUS
Bringing Machine Translation for European Languages to the User
  • Funder: European Commission (EC)
  • Project Code: 231720
  • Funding stream: FP7 | SP1 | ICT
Communities
CLARIN
Digital Humanities and Cultural Heritage
Any information missing or wrong?Report an Issue