research data . Dataset . 2018

Annotated Corpus for Occitan

Bras, Myriam; Esher, Louise; Sibille, Jean; Vergez-Couret, Marianne;
Open Access Occitan (post 1500); Provençal
  • Published: 22 Jun 2018
  • Publisher: Zenodo
This corpus contains a collection of texts in Occitan which were manually annotated with parts-of-speech, lemmas. The corpus was produced in the context of the RESTAURE project, funded by the French ANR. The current version of the corpus contains 28 documents and 12,425 tokens. The annotation process is detailed in the following article: The annotated versions are provided in a TSV CoNLL-U format.
free text keywords: Occitan, Corpus, Linguistics, Part Of Speech, Natural Language Processing, Lemma, FOS: Languages and literature
  • Digital Humanities and Cultural Heritage
Download fromView all 2 versions
Open Access
Dataset . 2018
Providers: ZENODO
Any information missing or wrong?Report an Issue