Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
2 Research products, page 1 of 1

  • Digital Humanities and Cultural Heritage
  • Research software
  • Other research products
  • 2019-2023
  • AR
  • English

Relevance
arrow_drop_down
  • Open Access English
    Authors: 
    Mechaca C., Ana L.; Marmanillo, Walter G.; Xamena, Eduardo; Ramirez-Orta, Juan; Maguitman, Ana Gabriela; Milios, Evangelos E.;
    Country: Argentina

    Digital Humanities researchers often make use of software that helps them in the task of finding non-trivial relationships among characters in historical text. Usually, the source texts that contain such information come from OCR acquired volumes, carrying high amounts of errors within them. This work explains the development of a web platform for the task of OCR post-processing and ground-truth generation. This platform employs machine learning to predict the correct texts accurately from OCR noisy strings. The method used for this task involves transformers for character-based denoising language models. An active learning workflow is proposed, as the users can feed their corrections to the platform, generating new annotated data for re-training the underlying machine learning correction models. Sociedad Argentina de Informática e Investigación Operativa

  • Open Access English
    Authors: 
    Xamena, Eduardo; Marmanillo, Walter Gabriel; Mechaca, Ana Lidia;
    Country: Argentina

    Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools. Sociedad Argentina de Informática e Investigación Operativa

Powered by OpenAIRE graph
Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
2 Research products, page 1 of 1
  • Open Access English
    Authors: 
    Mechaca C., Ana L.; Marmanillo, Walter G.; Xamena, Eduardo; Ramirez-Orta, Juan; Maguitman, Ana Gabriela; Milios, Evangelos E.;
    Country: Argentina

    Digital Humanities researchers often make use of software that helps them in the task of finding non-trivial relationships among characters in historical text. Usually, the source texts that contain such information come from OCR acquired volumes, carrying high amounts of errors within them. This work explains the development of a web platform for the task of OCR post-processing and ground-truth generation. This platform employs machine learning to predict the correct texts accurately from OCR noisy strings. The method used for this task involves transformers for character-based denoising language models. An active learning workflow is proposed, as the users can feed their corrections to the platform, generating new annotated data for re-training the underlying machine learning correction models. Sociedad Argentina de Informática e Investigación Operativa

  • Open Access English
    Authors: 
    Xamena, Eduardo; Marmanillo, Walter Gabriel; Mechaca, Ana Lidia;
    Country: Argentina

    Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools. Sociedad Argentina de Informática e Investigación Operativa

Powered by OpenAIRE graph