• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 2 versions
Publication . Article . 1996

Content Analysis - A Corpus Based Model

Kuang-Hua Chen; Hsin-Hsi Chen;
Open Access
Published: 01 Dec 1996 Journal: Journal of Library Science, issue 11, pages 95-112 (issn: 1018-3817, Copyright policy )
Publisher: National Taiwan University
An important step to understand text is to build the discourse structure through cohesion and coherence. However, to build the discourse structure in turn depends on the full understanding of texts, so that many efforts on this line are not automatic and not successful. A corpus-based model based on 1) repetition of words, 2) importance of words, and 3) collocational semantics for texts is proposed in this paper. It focuses on association norms of noun-noun relations and noun-verb relations defined on discourse level and sentence level, respectively. According to this model, a text partition algorithm is proposed to determine the boundaries of discourse structures and a topic identification algorithm is also presented. The results of a series of experiments show that the proposed model is promising. (Article content in Chinese with English abstract)
Subjects by Vocabulary

Library of Congress Subject Headings: lcsh:Bibliography. Library science. Information resources lcsh:Z

ACM Computing Classification System: ComputingMethodologies_DOCUMENTANDTEXTPROCESSING


Discourse Analysis, Information Retrieval, Natural Language Processing, DOAJ:Library and Information Science, DOAJ:Social Sciences

Related to Research communities
Digital Humanities and Cultural Heritage
Download fromView all 2 sources