research data . Dataset . 2014

Computational Linguistic Analysis of Earthquake Collections

Bialousz, Kenneth; Kokal, Kevin; Orleans-Pobee, Kwamina; Wakeley, Christopher;
Open Access English
  • Published: 01 Dec 2014
  • Country: United States
Abstract
Both PDF and Word versions for the final report, a ZIP file of source code, and a PDF and PowerPoint of the final presentation. CS4984 is a newly-offered class at Virginia Tech with a unit based, project-problem based learning curriculum. This class style is based on NSF-funded work on curriculum for the field of digital libraries and related topics, and in this class, is used to guide a student based investigation of computational linguistics. The specific problem this report addresses is the creation of a means to automatically generate a short summary of a corpus of articles about earthquakes. Such a summary should be best representative of the texts and incl...
Persistent Identifiers
Subjects
free text keywords: natural language processing, Hadoop, Mahout, LDA, K-means clustering, NLTK, Python, natural language generation, Solr, Stanford NER, part-of-speech tagging
Related Organizations
Communities
Digital Humanities and Cultural Heritage
Any information missing or wrong?Report an Issue