Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients

PDF / 1,442,493 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
6 Downloads / 236 Views

RESEARCH ARTICLE

Open Access

Latent‑space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients Shaoke Lou1,2†, Tianxiao Li1,2†, Daniel Spakowicz1,2,3, Xiting Yan4, Geoffrey Lowell Chupp4 and Mark Gerstein1,2* *Correspondence: [email protected] † Shaoke Lou and Tianxiao Li have contributed equally to this work 1 Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Full list of author information is available at the end of the article

Abstract Background: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy highdimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data. Results: Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. Conclusion: We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients. Keywords: Asthma, Asthma subtypes, Denoising autoencoder, Biomarker, Noninvasive

Background Asthma is a common chronic disease of the airways. According to a medical expenditure survey in the United States from 2008 to 2013, asthma has a prevalence of 4.8% and imposes significant economic burden, including costs due to missed work © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indic

Data Loading...

Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients

Recommend Documents

Pathogenic characteristics of sputum and bronchoalveolar lavage fluid samples from patients with lower respiratory tract

Gene Expression Data Matrix

Gene Expression Data Analysis: Classification

Optimized gene selection and classification of cancer from microarray gene expression data using deep learning

Gene Expression Data Analysis: Supervised Analysis

Gene Expression Data Analysis: Unsupervised Analysis

Fuzzy Classification for Gene Expression Data Analysis

Modelling gene interaction networks from time-series gene expression data using evolving spiking neural networks

Data Generation Using Gene Expression Generator

Genome-wide profiling of DNA methylation and gene expression identifies candidate genes for human diabetic neuropathy

Gene expression identifies heterogeneity of metastatic behavior among high-grade non-translocation associated soft tissu

Identification of common signatures in idiopathic pulmonary fibrosis and lung cancer using gene expression modeling