publication . Other literature type . Article . Preprint . 2019 . Embargo end date: 01 Jan 2019

Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review

Sheikhalishahi, Seyedmostafa; Miotto, Riccardo; Dudley, Joel T; Lavelli, Alberto; Rinaldi, Fabio; Osmani, Venet;
Open Access
  • Published: 01 Apr 2019
  • Publisher: JMIR Publications
  • Country: Switzerland
Abstract
Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using ICD-10. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown...
Subjects
free text keywords: Review, electronic health records, clinical notes, chronic diseases, natural language processing, machine learning, deep learning, heart disease, stroke, cancer, diabetes, lung disease, Computer Science - Computers and Society, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Information Retrieval, Institute of Computational Linguistics, Digital Society Initiative, 000 Computer science, knowledge & systems, 410 Linguistics, electronic health records; clinical notes; chronic diseases; natural language processing; machine learning; deep learning; heart disease; stroke; cancer; diabetes; lung disease, Systematic review, Translational research, Medical record, Artificial intelligence, business.industry, business, Population, education.field_of_study, education, Computer science, Natural language processing, computer.software_genre, computer, Disease, Unstructured data, Interpretability, Deep learning
Funded by
EC| WellCO
Project
WellCO
Wellbeing and Health Virtual Coach
  • Funder: European Commission (EC)
  • Project Code: 769765
  • Funding stream: H2020 | RIA
Validated by funder
Communities
Digital Humanities and Cultural Heritage
104 references, page 1 of 7

1. World Health Organization. WHO Global status report on noncommunicable diseases 2014 URL: https://www.who.int/ nmh/publications/ncd-status-report-2014/en/ [accessed 2019-03-29] [WebCite Cache ID 77Fa8uXax]

2. Kruse CS, Kothman K, Anerobi K, Abanaka L. Adoption factors of the electronic health record: a systematic review. JMIR Med Inform 2016 Jun 01;4(2):e19. [doi: 10.2196/medinform.5525] [Medline: 27251559]

3. Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 2016 Dec 17;6:26094 [FREE Full text] [doi: 10.1038/srep26094] [Medline: 27185194]

4. Jensen P, Jensen L, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012 May 02;13(6):395-405. [doi: 10.1038/nrg3208] [Medline: 22549152]

5. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc 2017 Jan;24(1):198-208. [doi: 10.1093/jamia/ocw042] [Medline: 27189013]

6. Ye C, Fu T, Hao S, Zhang Y, Wang O, Jin B, et al. Prediction of incident hypertension within the next year: prospective study using statewide electronic health records and machine learning. J Med Internet Res 2018 Jan 30;20(1):e22 [FREE Full text] [doi: 10.2196/jmir.9268] [Medline: 29382633]

7. Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 2017 May 06. [doi: 10.1093/bib/bbx044] [Medline: 28481991]

8. Jensen K, Soguero-Ruiz C, Oyvind MK, Lindsetmo R, Kouskoumvekaki I, Girolami M, et al. Analysis of free text in electronic health records for identification of cancer patient trajectories. Sci Rep 2017 Dec 07;7:46226 [FREE Full text] [doi: 10.1038/srep46226] [Medline: 28387314]

9. Flynn R, Macdonald TM, Schembri N, Murray GD, Doney ASF. Automated data capture from free-text radiology reports to enhance accuracy of hospital inpatient stroke codes. Pharmacoepidemiol Drug Saf 2010 Aug;19(8):843-847. [doi: 10.1002/pds.1981] [Medline: 20602346]

10. Popejoy LL, Khalilia MA, Popescu M, Galambos C, Lyons V, Rantz M, et al. Quantifying care coordination using natural language processing and domain-specific ontology. J Am Med Inform Assoc 2015 Apr;22(e1):e93-e103 [FREE Full text] [doi: 10.1136/amiajnl-2014-002702] [Medline: 25324557]

11. Yang H, Spasic I, Keane JA, Nenadic G. A text mining approach to the prediction of disease status from clinical discharge summaries. J Am Med Inform Assoc 2009;16(4):596-600 [FREE Full text] [doi: 10.1197/jamia.M3096] [Medline: 19390098]

12. Wei W, Teixeira PL, Mo H, Cronin RM, Warner JL, Denny JC. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assoc 2016 Apr;23(e1):e20-e27 [FREE Full text] [doi: 10.1093/jamia/ocv130] [Medline: 26338219]

13. Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 2016 Dec;23(5):1007-1015 [FREE Full text] [doi: 10.1093/jamia/ocv180] [Medline: 26911811]

14. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2014;21(2):221-230 [FREE Full text] [doi: 10.1136/amiajnl-2013-001935] [Medline: 24201027]

15. Abbe A, Grouin C, Zweigenbaum P, Falissard B. Text mining applications in psychiatry: a systematic literature review. Int J Methods Psychiatr Res 2016 Dec;25(2):86-100. [doi: 10.1002/mpr.1481] [Medline: 26184780]

104 references, page 1 of 7
Abstract
Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using ICD-10. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown...
Subjects
free text keywords: Review, electronic health records, clinical notes, chronic diseases, natural language processing, machine learning, deep learning, heart disease, stroke, cancer, diabetes, lung disease, Computer Science - Computers and Society, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Information Retrieval, Institute of Computational Linguistics, Digital Society Initiative, 000 Computer science, knowledge & systems, 410 Linguistics, electronic health records; clinical notes; chronic diseases; natural language processing; machine learning; deep learning; heart disease; stroke; cancer; diabetes; lung disease, Systematic review, Translational research, Medical record, Artificial intelligence, business.industry, business, Population, education.field_of_study, education, Computer science, Natural language processing, computer.software_genre, computer, Disease, Unstructured data, Interpretability, Deep learning
Funded by
EC| WellCO
Project
WellCO
Wellbeing and Health Virtual Coach
  • Funder: European Commission (EC)
  • Project Code: 769765
  • Funding stream: H2020 | RIA
Validated by funder
Communities
Digital Humanities and Cultural Heritage
104 references, page 1 of 7

1. World Health Organization. WHO Global status report on noncommunicable diseases 2014 URL: https://www.who.int/ nmh/publications/ncd-status-report-2014/en/ [accessed 2019-03-29] [WebCite Cache ID 77Fa8uXax]

2. Kruse CS, Kothman K, Anerobi K, Abanaka L. Adoption factors of the electronic health record: a systematic review. JMIR Med Inform 2016 Jun 01;4(2):e19. [doi: 10.2196/medinform.5525] [Medline: 27251559]

3. Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 2016 Dec 17;6:26094 [FREE Full text] [doi: 10.1038/srep26094] [Medline: 27185194]

4. Jensen P, Jensen L, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012 May 02;13(6):395-405. [doi: 10.1038/nrg3208] [Medline: 22549152]

5. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc 2017 Jan;24(1):198-208. [doi: 10.1093/jamia/ocw042] [Medline: 27189013]

6. Ye C, Fu T, Hao S, Zhang Y, Wang O, Jin B, et al. Prediction of incident hypertension within the next year: prospective study using statewide electronic health records and machine learning. J Med Internet Res 2018 Jan 30;20(1):e22 [FREE Full text] [doi: 10.2196/jmir.9268] [Medline: 29382633]

7. Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 2017 May 06. [doi: 10.1093/bib/bbx044] [Medline: 28481991]

8. Jensen K, Soguero-Ruiz C, Oyvind MK, Lindsetmo R, Kouskoumvekaki I, Girolami M, et al. Analysis of free text in electronic health records for identification of cancer patient trajectories. Sci Rep 2017 Dec 07;7:46226 [FREE Full text] [doi: 10.1038/srep46226] [Medline: 28387314]

9. Flynn R, Macdonald TM, Schembri N, Murray GD, Doney ASF. Automated data capture from free-text radiology reports to enhance accuracy of hospital inpatient stroke codes. Pharmacoepidemiol Drug Saf 2010 Aug;19(8):843-847. [doi: 10.1002/pds.1981] [Medline: 20602346]

10. Popejoy LL, Khalilia MA, Popescu M, Galambos C, Lyons V, Rantz M, et al. Quantifying care coordination using natural language processing and domain-specific ontology. J Am Med Inform Assoc 2015 Apr;22(e1):e93-e103 [FREE Full text] [doi: 10.1136/amiajnl-2014-002702] [Medline: 25324557]

11. Yang H, Spasic I, Keane JA, Nenadic G. A text mining approach to the prediction of disease status from clinical discharge summaries. J Am Med Inform Assoc 2009;16(4):596-600 [FREE Full text] [doi: 10.1197/jamia.M3096] [Medline: 19390098]

12. Wei W, Teixeira PL, Mo H, Cronin RM, Warner JL, Denny JC. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assoc 2016 Apr;23(e1):e20-e27 [FREE Full text] [doi: 10.1093/jamia/ocv130] [Medline: 26338219]

13. Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 2016 Dec;23(5):1007-1015 [FREE Full text] [doi: 10.1093/jamia/ocv180] [Medline: 26911811]

14. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2014;21(2):221-230 [FREE Full text] [doi: 10.1136/amiajnl-2013-001935] [Medline: 24201027]

15. Abbe A, Grouin C, Zweigenbaum P, Falissard B. Text mining applications in psychiatry: a systematic literature review. Int J Methods Psychiatr Res 2016 Dec;25(2):86-100. [doi: 10.1002/mpr.1481] [Medline: 26184780]

104 references, page 1 of 7
Any information missing or wrong?Report an Issue