publication . Article . Preprint . 2018

Generating automatically labeled data for author name disambiguation: an iterative clustering method

Jinseok Kim; Jinmo Kim; Jason Owen-Smith;
Open Access
  • Published: 29 Nov 2018 Journal: Scientometrics, volume 118, pages 253-280 (issn: 0138-9130, eissn: 1588-2861, Copyright policy)
  • Publisher: Springer Science and Business Media LLC
Abstract
To train algorithms for supervised author name disambiguation, many studies have relied on hand-labeled truth data that are very laborious to generate. This paper shows that labeled training data can be automatically generated using information features such as email address, coauthor names, and cited references that are available from publication records. For this purpose, high-precision rules for matching name instances on each feature are decided using an external-authority database. Then, selected name instances in target ambiguous data go through the process of pairwise matching based on the rules. Next, they are merged into clusters by a generic entity res...
Persistent Identifiers
Subjects
free text keywords: General Social Sciences, Library and Information Sciences, Computer Science Applications, Computer Science - Digital Libraries, Computer Science - Information Retrieval, Computer Science - Machine Learning, Author name, Artificial intelligence, business.industry, business, Test data, Computer science, Cluster analysis, Pairwise comparison, Population, education.field_of_study, education, Labeled data, Natural language processing, computer.software_genre, computer, Ambiguity, media_common.quotation_subject, media_common, Email address
Related Organizations
Any information missing or wrong?Report an Issue