software . 2020

Pie Model for Classical French -- Part-of-Speech and Morphology (CATTEX2009-max)

Camps, Jean-Baptiste; Gabay, Simon; Clérice, Thibault; Cafiero, Florian;
Open Access French
  • Published: 04 Mar 2020
  • Publisher: Zenodo
Abstract
Pie Model for Classical French, for Part-of-Speech and Morphology tags (CATTEX2009-max). Trained on a corpus of Classical French Theatre. More information: - corpus: Camps, Jean-Baptiste, & Cafiero, Florian. (2019). Stylometric Analysis of Classical French Theatre [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3353421. - F. Cafiero and J.B. Camps, Why Molière most likely did write his plays, Science Advances, 27 Nov 2019: Vol. 5, no. 11, eaax5489, DOI: 10.1126/sciadv.aax5489, https://advances.sciencemag.org/content/5/11/eaax5489/. - J.B. Camps, S. Gabay, Th. Clérice and F. Cafiero, Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre, to be published. Current results on test data: ::: Evaluation report for task: pos ::: all: accuracy: 0.9701 precision: 0.92 recall: 0.8964 support: 4181 ambiguous-tokens: accuracy: 0.9229 precision: 0.9203 recall: 0.9175 support: 934 unknown-tokens: accuracy: 0.8165 precision: 0.4798 recall: 0.4904 support: 218 ::: Evaluation report for task: MODE ::: all: accuracy: 0.9818 precision: 0.8765 recall: 0.8517 support: 4181 ambiguous-tokens: accuracy: 0.84 precision: 0.8483 recall: 0.7612 support: 125 unknown-tokens: accuracy: 0.8211 precision: 0.7256 recall: 0.658 support: 218 ::: Classification report ::: | target | precision | recall | f1-score | support | |-------------|-----------|--------|----------|---------| | MODE=con | 0.81 | 0.94 | 0.87 | 18 | | MODE=imp | 0.83 | 0.78 | 0.80 | 68 | | MODE=ind | 0.91 | 0.92 | 0.92 | 341 | | MODE=sub | 0.84 | 0.62 | 0.71 | 60 | | MODE=x | 0.99 | 1.00 | 1.00 | 3694 | | avg / total | 0.88 | 0.85 | 0.86 | 4181 | ::: Evaluation report for task: TEMPS ::: all: accuracy: 0.9871 precision: 0.9305 recall: 0.9259 support: 4181 ambiguous-tokens: accuracy: 0.9135 precision: 0.623 recall: 0.6072 support: 104 unknown-tokens: accuracy: 0.8394 precision: 0.8693 recall: 0.5399 support: 218 ::: Classification report ::: | target | precision | recall | f1-score | support | |-------------|-----------|--------|----------|---------| | TEMPS=fut | 0.98 | 0.85 | 0.91 | 47 | | TEMPS=ipf | 0.93 | 0.88 | 0.90 | 16 | | TEMPS=psp | 0.80 | 1.00 | 0.89 | 4 | | TEMPS=pst | 0.95 | 0.91 | 0.93 | 334 | | TEMPS=x | 0.99 | 1.00 | 0.99 | 3780 | | avg / total | 0.93 | 0.93 | 0.92 | 4181 | ::: Evaluation report for task: PERS ::: all: accuracy: 0.9859 precision: 0.9821 recall: 0.9668 support: 4181 ambiguous-tokens: accuracy: 0.942 precision: 0.9178 recall: 0.9188 support: 362 unknown-tokens: accuracy: 0.8394 precision: 0.9426 recall: 0.6344 support: 218 ::: Classification report ::: | target | precision | recall | f1-score | support | |-------------|-----------|--------|----------|---------| | PERS.=1 | 0.98 | 0.96 | 0.97 | 429 | | PERS.=2 | 0.97 | 0.97 | 0.97 | 258 | | PERS.=3 | 0.99 | 0.94 | 0.96 | 410 | | PERS.=x | 0.99 | 1.00 | 0.99 | 3084 | | avg / total | 0.98 | 0.97 | 0.97 | 4181 | ::: Evaluation report for task: NOMB ::: all: accuracy: 0.9797 precision: 0.9809 recall: 0.9733 support: 4181 ambiguous-tokens: accuracy: 0.7865 precision: 0.7511 recall: 0.6884 support: 192 unknown-tokens: accuracy: 0.8349 precision: 0.7918 recall: 0.7729 support: 218 ::: Classification report ::: | target | precision | recall | f1-score | support | |-------------|-----------|--------|----------|---------| | NOMB.=p | 0.98 | 0.95 | 0.97 | 545 | | NOMB.=s | 0.98 | 0.98 | 0.98 | 1831 | | NOMB.=x | 0.98 | 0.99 | 0.98 | 1805 | | avg / total | 0.98 | 0.97 | 0.98 | 4181 | ::: Evaluation report for task: GENRE ::: all: accuracy: 0.9749 precision: 0.969 recall: 0.9685 support: 4181 ambiguous-tokens: accuracy: 0.9118 precision: 0.9063 recall: 0.9208 support: 465 unknown-tokens: accuracy: 0.7385 precision: 0.7097 recall: 0.6977 support: 218 ::: Classification report ::: | target | precision | recall | f1-score | support | |-------------|-----------|--------|----------|---------| | GENRE=f | 0.92 | 0.94 | 0.93 | 387 | | GENRE=m | 0.97 | 0.94 | 0.96 | 940 | | GENRE=n | 1.00 | 1.00 | 1.00 | 45 | | GENRE=x | 0.98 | 0.99 | 0.99 | 2809 | | avg / total | 0.97 | 0.97 | 0.97 | 4181 | ::: Evaluation report for task: CAS ::: all: accuracy: 0.9983 precision: 0.9957 recall: 0.9901 support: 4181 ambiguous-tokens: accuracy: 0.9648 precision: 0.9796 recall: 0.9692 support: 199 unknown-tokens: accuracy: 1.0 precision: 1.0 recall: 1.0 support: 218 ::: Classification report ::: | target | precision | recall | f1-score | support | |-------------|-----------|--------|----------|---------| | CAS=i | 1.00 | 1.00 | 1.00 | 46 | | CAS=n | 1.00 | 1.00 | 1.00 | 190 | | CAS=r | 0.98 | 0.96 | 0.97 | 128 | | CAS=x | 1.00 | 1.00 | 1.00 | 3817 | | avg / total | 1.00 | 0.99 | 0.99 | 4181 |
{"references": ["Cafiero and Camps (2019). Why Moli\u00e8re most likely did write his plays, Science Advances, 27 Nov 2019: Vol. 5, no. 11, eaax5489, DOI: 10.1126/sciadv.aax5489,", "Camps, Gabay, Cl\u00e9rice and Cafiero (to be published). Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre."]}
Subjects
free text keywords: Natural language processing, Part-of-speech tagging, Classical French, French Language, Deep Learning
Communities
  • Digital Humanities and Cultural Heritage
Download fromView all 3 versions
Open Access
ZENODO
Software . 2020
Providers: ZENODO
Open Access
ZENODO
Software . 2020
Providers: ZENODO
Any information missing or wrong?Report an Issue