research data . Dataset . 2020 . Embargo end date: 09 Mar 2021

EvaLatin 2020: data

Sprugnoli, Rachele; Pellegrini, Matteo; Cecchini, Flavio Massimiliano; Passarotti, Marco;
Open Access
  • Published: 01 Jan 2020
  • Publisher: CIRCSE Research Centre, Università Cattolica del Sacro Cuore
Abstract
Training and gold test data released in EvaLatin 2020, the evaluation campaign of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, were aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts taken from the Perseus Digital Library, processed with UDPipe models and then manually corrected by Latin experts. The training set includes only prose texts by Classical authors. The test set, alongside with prose texts by the same authors represented in the training set, also includes data relative to poetry and to the Medieval period.
Persistent Identifiers
Communities
  • Digital Humanities and Cultural Heritage
Funded by
EC| LiLa
Project
LiLa
Linking Latin. Building a Knowledge Base of Linguistic Resources for Latin
  • Funder: European Commission (EC)
  • Project Code: 769994
  • Funding stream: H2020 | ERC | ERC-COG
Any information missing or wrong?Report an Issue