research data . Dataset . 2021 . Embargo end date: 11 Mar 2021

Coreference in Universal Dependencies 0.1 (CorefUD 0.1)

Nedoluzhko, Anna; Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; Zeman, Daniel;
Open Access
  • Published: 11 Mar 2021
  • Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Abstract
CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 0.1 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available editi...
Persistent Identifiers
Funded by
EC| Bergamot
Project
Bergamot
Browser-based Multilingual Translation
  • Funder: European Commission (EC)
  • Project Code: 825303
  • Funding stream: H2020 | RIA
Communities
CLARIN
Digital Humanities and Cultural Heritage
Download from
Any information missing or wrong?Report an Issue