research product . 2013

Evaluation of LSA performance in Spanish using multiple corpus of text

Carrillo, Facundo; Cecchi, Guillermo; Sigman, Mariano; Fernández Slezak, Diego;
Open Access English
  • Published: 01 Jan 2013
  • Country: Argentina
Abstract
Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance.
Sociedad Argentina de Informática e Investigación Operativa
Subjects
ACM Computing Classification System: InformationSystems_INFORMATIONSTORAGEANDRETRIEVALComputingMethodologies_PATTERNRECOGNITION
free text keywords: Ciencias Informáticas, Latent Semantic Analysis, regional Spanish corpus, Natural Language Processing
Communities
  • Digital Humanities and Cultural Heritage
Any information missing or wrong?Report an Issue