Actions
  • shareshare
  • link
  • cite
  • add
Powered by OpenAIRE graph
Found an issue? Give us feedback
add
auto_awesome_motion View all 2 versions
Publication . Article . 2018

Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis.

Guocai Chen; Juan Camilo Ramírez; Nan Deng; Xing Qiu; Canglin Wu; W. Jim Zheng; Hulin Wu;
Open Access
Published: 24 Feb 2018 Journal: Database : the journal of biological databases and curation, volume 2,019 (issn: 1758-0463, Copyright policy )
Abstract

Abstract Motivation Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse. Results We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database (ReGEO). ReGEO reorganizes and categorizes GEO series and makes them searchable by two new attributes extracted automatically from each series’ metadata. These attributes are the number of time points tested in the experiment and the disease being investigated. ReGEO also makes series searchable by other attributes available in GEO, such as platform organism, experiment type, associated PubMed ID as well as general keywords in the study’s description. Our approach greatly expands the usability of GEO data, demonstrating a credible approach to improve the utility of vast amount of publicly available data in the era of Big Data research.

Subjects by Vocabulary

Microsoft Academic Graph classification: Restructuring Genome dynamics Reuse Information retrieval Gene expression omnibus Geo database Usability business.industry business Big data Metadata Computer science

Subjects

Data Mining, Database Management Systems, Databases, Genetic, Gene Expression, Metadata, Metagenomics, Database Tool, General Agricultural and Biological Sciences, General Biochemistry, Genetics and Molecular Biology, Information Systems

29 references, page 1 of 3

1.Zhu L. and Zheng W.J. (2018) Informatics, data science, and artificial intelligence. JAMA, 320, 1103–1104.30326503 [OpenAIRE] [PubMed]

2.Tenopir C., Dalton E.D., Allard S.et al. (2015) Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLoS One, 10, e0134826.26308551 [OpenAIRE] [PubMed]

3.Wang J., Wang T., Yang Z.et al. (2016) eSplash: Efficient speculation in large scale heterogeneous computing systems. In the IEEE 35th International Performance Computing and Communications Conference (IPCCC 2016) IEEE.

4.Tachmazidis I., Cheng L., Kotoulas S.et al. (2014) Massively parallel reasoning under the well-founded semantics using X10. In: IEEE 26th International Conference on Tools with Artificial Intelligence (ICTAI 2014) IEEE.

5.Edgar R., Domrachev M. and Lash A.E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res., 30, 207–210.11752295 [OpenAIRE] [PubMed]

6.Barrett T., Wilhite S.E., Ledoux P.et al. (2012) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res., 41, D991–D995.23193258 [OpenAIRE] [PubMed]

7.Hood L. and Friend S.H. (2011) Predictive, personalized, preventive, participatory (P4) cancer medicine. Nat. Rev. Clin. Oncol., 8, 184–187.21364692 [OpenAIRE] [PubMed]

8.Khoury M.J., Gwinn M.L., Glasgow R.E.et al. (2012) A population approach to precision medicine. Am. J. Prev. Med., 42, 639–645.22608383 [OpenAIRE] [PubMed]

9.Chen G., Cairelli M.J., Kilicoglu H.et al. (2014) Augmenting microarray data with literature-based knowledge to enhance gene regulatory network inference. PLoS Comput. Biol., 10, e1003666.24921649 [OpenAIRE] [PubMed]

10.Carey M., Wu S., Gan G.et al. (2016) Correlation-based iterative clustering methods for time course data: the identification of temporal gene response modules for influenza infection in humans. Infect. Dis. Model., 1, 28–39.29928719 [OpenAIRE] [PubMed]

Powered by OpenAIRE graph
Found an issue? Give us feedback
Funded by
NIH| Estimation Methods for Nonlinear ODE Models in AIDS Research
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 1R01AI087135-01
  • Funding stream: NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES
Related to Research communities
Digital Humanities and Cultural Heritage
moresidebar