Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
17 Research products, page 2 of 2

  • Digital Humanities and Cultural Heritage
  • Research data
  • Research software
  • Other research products
  • Chinese
  • Digital Humanities and Cultural Heritage

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Research data . 2021
    Chinese
    Authors: 
    Zengtao Jiao Zengtao Jiao;
    Publisher: Science Data Bank

    [instructions for use] 1. This data set is manually edited by Yidu cloud medicine according to the real medical record distribution; 2. This dataset is an example of the yidu-n7k dataset on openkg. Yidu-n7k dataset can only be used for academic research of natural language processing, not for commercial purposes. ———————————————— Yidu-n4k data set is derived from chip 2019 evaluation task 1, that is, the data set of "clinical terminology standardization task". The standardization of clinical terms is an indispensable task in medical statistics. Clinically, there are often hundreds of different ways to write about the same diagnosis, operation, medicine, examination, test and symptoms. The problem to be solved in Standardization (normalization) is to find the corresponding standard statement for various clinical statements. With the basis of terminology standardization, researchers can carry out subsequent statistical analysis of EMR. In essence, the task of clinical terminology standardization is also a kind of semantic similarity matching task. However, due to the diversity of original word expressions, a single matching model is difficult to achieve good results. Yidu cloud, a leading medical artificial intelligence technology company in the industry, is also the first Unicorn company to drive medical innovation solutions with data intelligence. With the mission of "data intelligence and green medical care" and the goal of "improving the relationship between human beings and diseases", Yidu cloud uses data artificial intelligence to help the government, hospitals and the whole industry fully tap the intelligent political and civil value of medical big data, and build a big data ecological platform for the medical industry that can cover the whole country, make overall utilization and unified access. Since its establishment in 2013, Yidu cloud has gathered world-renowned scientists and the best people in the professional field to form a strong talent team. The company has invested hundreds of millions of yuan in R & D and service system establishment every year, built a medical data intelligent platform with large data processing capacity, high data integrity and transparent development process, and has obtained more than dozens of software copyrights and national invention patents.

  • Restricted Chinese
    Authors: 
    Ji, Yichao; Liu, Xinyang; Ma, Kui; Zhao, Xuezhi; Sun, Qiao;
    Publisher: Zenodo

    Description Magi Open Information Extraction Dataset (MOIED) is a Chinese Open IE dataset containing 7,618,181 records extracted from plain text across 3,319,763 webpages in various domains. Each record in the dataset consists of the (subject, predicate, object) tuple, the associated confidence score, and the context information. The dataset comprises 1,427,742 distinct facts of 272,522 entities and 117,731 predicates. A notable property of MOIED is that each distinct fact has multiple records with URLs referring to mentions in diverse contexts, which enables multiple-instance learning (MIL) and other correlative approaches. As a paragraph level Open IE dataset, at least 45.1% of the records in MOIED can only be extracted through synthesizing information from multiple sentences. Magi is an extraction engine that continuously learns from the Internet, which combines cross-referencing, timeline analysis, and other heuristics to mitigate the inevitable false positives in the extractions. All records in MOIED were randomly sampled from a database dump of magi.com in January 2020. To provide more reliable evaluation results, human annotators examined the dataset and selected 19,161 verified records for the dev and test sets. Disclaimers The dataset is expected to be used in weakly supervised scenarios since the records in the training set are not human-annotated and could be imprecise or erroneous. Records are not guaranteed to be universally correct. The correctness of extractions should be evaluated based on contexts (specified by the URLs). The extraction was made at a certain time Magi visits the URL, thus it is not guaranteed that the URL is still accessible, or the content is unmodified since the extraction was conducted. Due to legal and regulatory issues, the webpage URLs are mostly ones accessible from Mainland China, yet, the content of certain webpages, as well as the extraction results, could be in violation of law and regulation of certain countries or regions in certain ways. This dataset contains content from the Internet, for copyright reasons, please do not redistribute or use it for non-research purposes.

  • Chinese
    Authors: 
    Bulag, Uradyn E.; Burunsain, Borjigin; Dorjraa;
    Publisher: Kalmyk Cultural Heritage Documentation Project, University of Cambridge
    Country: United Kingdom

    This video shows Wang Yanhong explaining the Dashdawa Mongol history to the representatives of five Dashdawa Mongol surname groups. He says that initially, about 1,000 Ööld people arrived at Chengde in 1757, followed by another group two years later, the same year when the Anyuan monastery in Chengde was built. Some years later, however, about 500 people were dispatched to Xinjiang to protect the Qing-Russian border areas. Sponsored by Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin

  • Restricted Chinese
    Authors: 
    Ji, Yichao; Liu, Xinyang; Ma, Kui; Zhao, Xuezhi; Sun, Qiao;
    Publisher: Zenodo

    Magi Entity Description Extraction Dataset (MEDED) contains 500,000 (entity, description, source URL) tuples extracted by the Magi system from Chinese webpages on the Internet that are accessible from Mainland China in May 2019. These data are learned automatically and should not be considered to contain any opinion of any human individual including the authors. Main contents of the source URLs can be found in the Magi Practical Web Article Corpus.

  • Open Access Chinese
    Authors: 
    Yichao Ji; Xinyang Liu; Kui Ma; Xuezhi Zhao; Qiao Sun;
    Publisher: Zenodo

    Magi Practical Web Article Corpus (MPWAC) contains 10 million Chinese web articles consisting of more than 10 billion words. Articles have been shuffled to mitigate the negative influence on mini-batch based training. Each article has been refined such that only the main body of the text is left, without advertisements or other noises. This corpus is extracted in May 2019, by magi.com crawlers located inside Mainland China. The merged file should be processed with Gzip stream. It is encoded in UTF-8, and has the format of '0xFE [URL] 0xFF [TEXT] ...'.

  • Open Access Chinese
    Authors: 
    Anderl, Christoph; Bingenheimer, Marcus; Chang, Po-yung; Lin, Ching-hui; Joey, Hung; Bell, Christian; Schrupp, Jan;
    Country: Belgium
  • Research data . Film . 2005
    Chinese
    Authors: 
    Gowlland, Geoffrey;
    Country: United Kingdom

    Filmed by Geoffrey Gowlland in Dingshan, Yixing, Jiangsu Province of China, in October 2004. Artist Zhao Jianghua demonstrates the making of one of the key tools used in Yixing pottery, and comments on the proper way of doing so.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
17 Research products, page 2 of 2
  • Research data . 2021
    Chinese
    Authors: 
    Zengtao Jiao Zengtao Jiao;
    Publisher: Science Data Bank

    [instructions for use] 1. This data set is manually edited by Yidu cloud medicine according to the real medical record distribution; 2. This dataset is an example of the yidu-n7k dataset on openkg. Yidu-n7k dataset can only be used for academic research of natural language processing, not for commercial purposes. ———————————————— Yidu-n4k data set is derived from chip 2019 evaluation task 1, that is, the data set of "clinical terminology standardization task". The standardization of clinical terms is an indispensable task in medical statistics. Clinically, there are often hundreds of different ways to write about the same diagnosis, operation, medicine, examination, test and symptoms. The problem to be solved in Standardization (normalization) is to find the corresponding standard statement for various clinical statements. With the basis of terminology standardization, researchers can carry out subsequent statistical analysis of EMR. In essence, the task of clinical terminology standardization is also a kind of semantic similarity matching task. However, due to the diversity of original word expressions, a single matching model is difficult to achieve good results. Yidu cloud, a leading medical artificial intelligence technology company in the industry, is also the first Unicorn company to drive medical innovation solutions with data intelligence. With the mission of "data intelligence and green medical care" and the goal of "improving the relationship between human beings and diseases", Yidu cloud uses data artificial intelligence to help the government, hospitals and the whole industry fully tap the intelligent political and civil value of medical big data, and build a big data ecological platform for the medical industry that can cover the whole country, make overall utilization and unified access. Since its establishment in 2013, Yidu cloud has gathered world-renowned scientists and the best people in the professional field to form a strong talent team. The company has invested hundreds of millions of yuan in R & D and service system establishment every year, built a medical data intelligent platform with large data processing capacity, high data integrity and transparent development process, and has obtained more than dozens of software copyrights and national invention patents.

  • Restricted Chinese
    Authors: 
    Ji, Yichao; Liu, Xinyang; Ma, Kui; Zhao, Xuezhi; Sun, Qiao;
    Publisher: Zenodo

    Description Magi Open Information Extraction Dataset (MOIED) is a Chinese Open IE dataset containing 7,618,181 records extracted from plain text across 3,319,763 webpages in various domains. Each record in the dataset consists of the (subject, predicate, object) tuple, the associated confidence score, and the context information. The dataset comprises 1,427,742 distinct facts of 272,522 entities and 117,731 predicates. A notable property of MOIED is that each distinct fact has multiple records with URLs referring to mentions in diverse contexts, which enables multiple-instance learning (MIL) and other correlative approaches. As a paragraph level Open IE dataset, at least 45.1% of the records in MOIED can only be extracted through synthesizing information from multiple sentences. Magi is an extraction engine that continuously learns from the Internet, which combines cross-referencing, timeline analysis, and other heuristics to mitigate the inevitable false positives in the extractions. All records in MOIED were randomly sampled from a database dump of magi.com in January 2020. To provide more reliable evaluation results, human annotators examined the dataset and selected 19,161 verified records for the dev and test sets. Disclaimers The dataset is expected to be used in weakly supervised scenarios since the records in the training set are not human-annotated and could be imprecise or erroneous. Records are not guaranteed to be universally correct. The correctness of extractions should be evaluated based on contexts (specified by the URLs). The extraction was made at a certain time Magi visits the URL, thus it is not guaranteed that the URL is still accessible, or the content is unmodified since the extraction was conducted. Due to legal and regulatory issues, the webpage URLs are mostly ones accessible from Mainland China, yet, the content of certain webpages, as well as the extraction results, could be in violation of law and regulation of certain countries or regions in certain ways. This dataset contains content from the Internet, for copyright reasons, please do not redistribute or use it for non-research purposes.

  • Chinese
    Authors: 
    Bulag, Uradyn E.; Burunsain, Borjigin; Dorjraa;
    Publisher: Kalmyk Cultural Heritage Documentation Project, University of Cambridge
    Country: United Kingdom

    This video shows Wang Yanhong explaining the Dashdawa Mongol history to the representatives of five Dashdawa Mongol surname groups. He says that initially, about 1,000 Ööld people arrived at Chengde in 1757, followed by another group two years later, the same year when the Anyuan monastery in Chengde was built. Some years later, however, about 500 people were dispatched to Xinjiang to protect the Qing-Russian border areas. Sponsored by Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin

  • Restricted Chinese
    Authors: 
    Ji, Yichao; Liu, Xinyang; Ma, Kui; Zhao, Xuezhi; Sun, Qiao;
    Publisher: Zenodo

    Magi Entity Description Extraction Dataset (MEDED) contains 500,000 (entity, description, source URL) tuples extracted by the Magi system from Chinese webpages on the Internet that are accessible from Mainland China in May 2019. These data are learned automatically and should not be considered to contain any opinion of any human individual including the authors. Main contents of the source URLs can be found in the Magi Practical Web Article Corpus.

  • Open Access Chinese
    Authors: 
    Yichao Ji; Xinyang Liu; Kui Ma; Xuezhi Zhao; Qiao Sun;
    Publisher: Zenodo

    Magi Practical Web Article Corpus (MPWAC) contains 10 million Chinese web articles consisting of more than 10 billion words. Articles have been shuffled to mitigate the negative influence on mini-batch based training. Each article has been refined such that only the main body of the text is left, without advertisements or other noises. This corpus is extracted in May 2019, by magi.com crawlers located inside Mainland China. The merged file should be processed with Gzip stream. It is encoded in UTF-8, and has the format of '0xFE [URL] 0xFF [TEXT] ...'.

  • Open Access Chinese
    Authors: 
    Anderl, Christoph; Bingenheimer, Marcus; Chang, Po-yung; Lin, Ching-hui; Joey, Hung; Bell, Christian; Schrupp, Jan;
    Country: Belgium
  • Research data . Film . 2005
    Chinese
    Authors: 
    Gowlland, Geoffrey;
    Country: United Kingdom

    Filmed by Geoffrey Gowlland in Dingshan, Yixing, Jiangsu Province of China, in October 2004. Artist Zhao Jianghua demonstrates the making of one of the key tools used in Yixing pottery, and comments on the proper way of doing so.