research data . Dataset . 2020

MOIED: Magi Open Information Extraction Dataset

Yichao Ji; Xinyang Liu; Kui Ma; Xuezhi Zhao; Qiao Sun;
Restricted Chinese
  • Published: 22 Feb 2020
  • Publisher: Zenodo
Abstract
<strong>Description</strong> Magi Open Information Extraction Dataset (MOIED) is a Chinese Open IE dataset containing 7,618,181 records extracted from plain text across 3,319,763 webpages in various domains. Each record in the dataset consists of the (subject, predicate, object) tuple, the associated confidence score, and the context information. The dataset comprises 1,427,742 distinct facts of 272,522 entities and 117,731 predicates. A notable property of MOIED is that each distinct fact has multiple records with URLs referring to mentions in diverse contexts, which enables multiple-instance learning (MIL) and other correlative approaches. As a paragraph level...
Subjects
free text keywords: Magi, Natural Language Processing, Open Information Extraction, Information Extraction, Relation Extraction, Weak supervision, Multiple-instance Learning, Knowledge Base, Knowledge Graph, Semantic Network, Chinese, NLP, Open IE, OIE
Communities
Digital Humanities and Cultural Heritage
Download fromView all 2 versions
Zenodo
Dataset . 2020
Provider: Datacite
Zenodo
Dataset . 2020
Provider: Zenodo
Zenodo
Dataset . 2020
Provider: Datacite
Any information missing or wrong?Report an Issue