Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
10 Research products, page 1 of 1

  • Digital Humanities and Cultural Heritage
  • Research software
  • Other research products
  • AR
  • English

Date (most recent)
arrow_drop_down
  • Open Access English
    Authors: 
    Mechaca C., Ana L.; Marmanillo, Walter G.; Xamena, Eduardo; Ramirez-Orta, Juan; Maguitman, Ana Gabriela; Milios, Evangelos E.;
    Country: Argentina

    Digital Humanities researchers often make use of software that helps them in the task of finding non-trivial relationships among characters in historical text. Usually, the source texts that contain such information come from OCR acquired volumes, carrying high amounts of errors within them. This work explains the development of a web platform for the task of OCR post-processing and ground-truth generation. This platform employs machine learning to predict the correct texts accurately from OCR noisy strings. The method used for this task involves transformers for character-based denoising language models. An active learning workflow is proposed, as the users can feed their corrections to the platform, generating new annotated data for re-training the underlying machine learning correction models. Sociedad Argentina de Informática e Investigación Operativa

  • Open Access English
    Authors: 
    Xamena, Eduardo; Marmanillo, Walter Gabriel; Mechaca, Ana Lidia;
    Country: Argentina

    Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools. Sociedad Argentina de Informática e Investigación Operativa

  • Open Access English
    Authors: 
    Grill, Pablo; Claassen, Mathias; Rosá, Aiala; Correa, Hernán;
    Country: Argentina

    This paper presents a series of semi-supervised learning algorithms which were designed to classify words or expressions with temporal meanings. The algorithms use a set of pre-tagged temporal expressions and a set of semantic classes which were defined within a research project on the lexical coding of temporal meaning in Spanish. The algorithms in this article are mostly based on word embeddings, but they also make use of other methods. The results obtained strongly depend on the temporal classes considered, but, for some classes, results have reached 90% precision or above. Sociedad Argentina de Informática e Investigación Operativa

  • Other research product . 2016
    Open Access English
    Authors: 
    Argerich, Luis; Cano, Matías J.; Torre Zaffaroni, Joaquín;
    Country: Argentina

    In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words.We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications. Sociedad Argentina de Informática e Investigación Operativa (SADIO)

  • Open Access English
    Authors: 
    Rio Riande, María Gimena del; González Blanco García, Elena; Martínez Cantón, Clara; Curado Malta, Mariana;
    Country: Argentina

    This paper presents work-in-progress of the POSTDATA project. This project aims to provide means to solve the interoperability issues that exist among the digital poetry repertoires. These repertoires hold data of poetry metrics that is locked in their own databases and it is not freely available to be compared and to be used by intelligent machines that could infer over the data. The POSTDATA project will use Linked Open Data (LOD) technologies to overcome the interoperability problems. POSTDATA is developing a metadata application proFIle (MAP) for the digital poetry repertoires, a construct that enhances interoperability.This development follows the method for the development of MAP (Me4MAP).A MAP for the digital poetry repertoires will open doors for this repertoires to be able to structure the data with a common model in order to publish it as Linked Open Data. This paper presents how this MAP is being developed so far. Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)

  • Open Access English
    Authors: 
    Garciarena Ucelay, María José; Villegas, María Paula; Cagnina, Leticia; Errecalde, Marcelo Luis;
    Country: Argentina

    Author Profiling is the task of predicting characteristics of the author of a text, such as age, gender, personality, native language, etc. This is a task of growing importance due to the potential applications in security, crime detection and marketing, among others. An interesting point is to study the robustness of a classifier when it is trained with a dataset and tested with others containing different characteristics. Commonly this is called cross domain experimentation. Although different cross domain studies have been done for datasets in English language, for Spanish it has recently begun. In this context, this work presents a study of cross domain classification for the author profiling task in Spanish. The experimental results showed that using corpora with different levels of formality we can obtain robust classifiers for the author profiling task in Spanish language. Red de Universidades con Carreras en Informática (RedUNCI) XII Workshop Bases de Datos y Minería de Datos (WBDDM)

  • Open Access English
    Authors: 
    Cardellino, Cristian; Alonso i Alemany, Laura;
    Country: Argentina

    We present SuFLexQA, a system for Question Answering that integrates deep linguistic information from verbal lexica into Quepy, a generic framework for translating natural language questions into a query language. We are participating in the QALD-3 contest to assess the main achievements and shortcomings of the system. Sociedad Argentina de Informática e Investigación Operativa

  • Open Access English
    Authors: 
    Carrillo, Facundo; Cecchi, Guillermo; Sigman, Mariano; Fernández Slezak, Diego;
    Country: Argentina

    Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance. Sociedad Argentina de Informática e Investigación Operativa

  • Open Access English
    Authors: 
    Rago, Alejandro; Marcos, Claudia A.; Diaz Pace, Andrés;
    Country: Argentina

    Recent trends in the software engineering community advocate for the improvement of textual requirements using (semi-)automated tools. In particular, the detection of incomplete or understated concerns at early development stages hold potential, due to the negative effects of untreated concerns on the development. Assistive tools can be of great help for analysts to get a quick picture of the requirements and narrow down the search for latent concerns. In this article, we present a tool called REAssistant that supports the process of discovering concerns in textual specifications. To do so, the tool relies on the UIMA framework and EMF-based technologies to provide an extensible architecture for concern-related analyses. Currently, the tool is configured to process textual use cases by using a number of textual analytics modules that identify lexical, syntactical and semantic entities in the specifications.We have conducted a preliminary evaluation of our tool in two case studies, obtaining promising results when comparing to manual inspections and to another tool. Sociedad Argentina de Informática e Investigación Operativa

  • Open Access English
    Authors: 
    Funes, Ana; Dasso, Aristides;
    Country: Argentina

    At early stages of software system development, system requirements often are expressed in natural language. There are a number of techniques to extract useful information from these documents to construct a more precise –and formal– document that expresses the system requirements. Some of these techniques consist in identifying system use cases during requirement analysis work. Particularly, event0based techniques identify –from the elicited documents– the external events that a system must respond to and then related them to use cases and actors. These event lists are simpler than use cases –and are a first step in building them. Although use cases have been proven to be a useful tool for requirement specification and facilitate the interaction with end users, they lack formality, giving place to misinterpretations and misunderstandings. Having this in mind, we propose a technique that integrates the understandability of graphical notations provided by use case notation with the unambiguity of formal specifications, by supplementing identified use cases –initially as a list of external events– with an initial formal specification consisting of function signatures and sorts in the RAISE Specification Language (RSL). Taking as input the identified external events associated with each system use case, which are expressed in natural language, we process them using a natural language tool that produces as output a structured format from which, by applying a set of rules, we translate them into RSL function signatures. Sociedad Argentina de Informática e Investigación Operativa