• shareshare
  • link
  • cite
  • add
Powered by OpenAIRE graph
Found an issue? Give us feedback
auto_awesome_motion View all 2 versions
Publication . Article . 2018

Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis.

Guocai Chen; Juan Camilo Ramírez; Nan Deng; Xing Qiu; Canglin Wu; W. Jim Zheng; Hulin Wu;
Open Access
Published: 24 Feb 2018 Journal: Database : the journal of biological databases and curation, volume 2,019 (issn: 1758-0463, Copyright policy )

Abstract Motivation Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse. Results We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database (ReGEO). ReGEO reorganizes and categorizes GEO series and makes them searchable by two new attributes extracted automatically from each series’ metadata. These attributes are the number of time points tested in the experiment and the disease being investigated. ReGEO also makes series searchable by other attributes available in GEO, such as platform organism, experiment type, associated PubMed ID as well as general keywords in the study’s description. Our approach greatly expands the usability of GEO data, demonstrating a credible approach to improve the utility of vast amount of publicly available data in the era of Big Data research.

Subjects by Vocabulary

Microsoft Academic Graph classification: Restructuring Genome dynamics Reuse Information retrieval Gene expression omnibus Geo database Usability business.industry business Big data Metadata Computer science


Data Mining, Database Management Systems, Databases, Genetic, Gene Expression, Metadata, Metagenomics, Database Tool, General Agricultural and Biological Sciences, General Biochemistry, Genetics and Molecular Biology, Information Systems

29 references, page 1 of 3

1.Zhu L. and Zheng W.J. (2018) Informatics, data science, and artificial intelligence. JAMA, 320, 1103–1104.30326503 [OpenAIRE] [PubMed]

2.Tenopir C., Dalton E.D., Allard al. (2015) Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLoS One, 10, e0134826.26308551 [OpenAIRE] [PubMed]

3.Wang J., Wang T., Yang al. (2016) eSplash: Efficient speculation in large scale heterogeneous computing systems. In the IEEE 35th International Performance Computing and Communications Conference (IPCCC 2016) IEEE.

4.Tachmazidis I., Cheng L., Kotoulas al. (2014) Massively parallel reasoning under the well-founded semantics using X10. In: IEEE 26th International Conference on Tools with Artificial Intelligence (ICTAI 2014) IEEE.

5.Edgar R., Domrachev M. and Lash A.E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res., 30, 207–210.11752295 [OpenAIRE] [PubMed]

6.Barrett T., Wilhite S.E., Ledoux al. (2012) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res., 41, D991–D995.23193258 [OpenAIRE] [PubMed]

7.Hood L. and Friend S.H. (2011) Predictive, personalized, preventive, participatory (P4) cancer medicine. Nat. Rev. Clin. Oncol., 8, 184–187.21364692 [OpenAIRE] [PubMed]

8.Khoury M.J., Gwinn M.L., Glasgow al. (2012) A population approach to precision medicine. Am. J. Prev. Med., 42, 639–645.22608383 [OpenAIRE] [PubMed]

9.Chen G., Cairelli M.J., Kilicoglu al. (2014) Augmenting microarray data with literature-based knowledge to enhance gene regulatory network inference. PLoS Comput. Biol., 10, e1003666.24921649 [OpenAIRE] [PubMed]

10.Carey M., Wu S., Gan al. (2016) Correlation-based iterative clustering methods for time course data: the identification of temporal gene response modules for influenza infection in humans. Infect. Dis. Model., 1, 28–39.29928719 [OpenAIRE] [PubMed]

Powered by OpenAIRE graph
Found an issue? Give us feedback
Funded by
NIH| Estimation Methods for Nonlinear ODE Models in AIDS Research
  • Funder: National Institutes of Health (NIH)
  • Project Code: 1R01AI087135-01
Related to Research communities
Digital Humanities and Cultural Heritage