Combining structured and unstructured data for predictive models: a deep learning approach
- PDF / 1,403,018 Bytes
- 11 Pages / 595.276 x 790.866 pts Page_size
- 44 Downloads / 245 Views
Open Access
RESEARCH ARTICLE
Combining structured and unstructured data for predictive models: a deep learning approach Dongdong Zhang1,2, Changchang Yin3, Jucheng Zeng1,2, Xiaohui Yuan2 and Ping Zhang1,3*
Abstract Background: The broad adoption of electronic health records (EHRs) provides great opportunities to conduct health care research and solve various clinical problems in medicine. With recent advances and success, methods based on machine learning and deep learning have become increasingly popular in medical informatics. However, while many research studies utilize temporal structured data on predictive modeling, they typically neglect potentially valuable information in unstructured clinical notes. Integrating heterogeneous data types across EHRs through deep learning techniques may help improve the performance of prediction models. Methods: In this research, we proposed 2 general-purpose multi-modal neural network architectures to enhance patient representation learning by combining sequential unstructured notes with structured data. The proposed fusion models leverage document embeddings for the representation of long clinical note documents and either convolutional neural network or long short-term memory networks to model the sequential clinical notes and temporal signals, and one-hot encoding for static information representation. The concatenated representation is the final patient representation which is used to make predictions. Results: We evaluate the performance of proposed models on 3 risk prediction tasks (i.e. in-hospital mortality, 30-day hospital readmission, and long length of stay prediction) using derived data from the publicly available Medical Information Mart for Intensive Care III dataset. Our results show that by combining unstructured clinical notes with structured data, the proposed models outperform other models that utilize either unstructured notes or structured data only. Conclusions: The proposed fusion models learn better patient representation by combining structured and unstructured data. Integrating heterogeneous data types across EHRs helps improve the performance of prediction models and reduce errors. Keywords: Electronic health records, Deep learning, Data fusion, Time series forecasting Background Electronic Health Records (EHRs) are longitudinal electronic records of patients’ health information, including structured data (patient demographics, vital signs, *Correspondence: [email protected] 3 Department of Computer Science and Engineering, The Ohio State University, 2015 Neil Ave, Columbus, OH 43210, USA Full list of author information is available at the end of the article
lab tests, etc.) and unstructured data (clinical notes and reports). In the United States, for example, over 30 million patients visit hospitals each year, and the percent of non-Federal acute care hospitals with the adoption of at least a Basic EHR system increased from 9.4 to 83.8% over the 7 years between 2008 and 2015 [1]. The broad adoption of EHRs provides unprecedented opp
Data Loading...