Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning m

  • PDF / 4,760,092 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 62 Downloads / 193 Views

DOWNLOAD

REPORT


Journal of Translational Medicine Open Access

RESEARCH

Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods Qidong Cai1,2, Boxue He1,2, Pengfei Zhang1,2, Zhenyu Zhao1,2, Xiong Peng1,2, Yuqian Zhang1,2, Hui Xie1,2 and Xiang Wang1,2* 

Abstract  Background:  Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to explore pivotal AS events (ASEs) to deepen understanding and improve prognostic assessments of lung adenocarcinoma (LUAD) via machine learning algorithms. Method:  RNA sequencing data and AS data were extracted from The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database. Using several machine learning methods, we identified 24 pairs of LUAD-related ASEs implicated in splicing switches and a random forest-based classifiers for identifying lymph node metastasis (LNM) consisting of 12 ASEs. Furthermore, we identified key prognosis-related ASEs and established a 16-ASE-based prognostic model to predict overall survival for LUAD patients using Cox regression model, random survival forest analysis, and forward selection model. Bioinformatics analyses were also applied to identify underlying mechanisms and associated upstream splicing factors (SFs). Results:  Each pair of ASEs was spliced from the same parent gene, and exhibited perfect inverse intrapair correlation (correlation coefficient = − 1). The 12-ASE-based classifier showed robust ability to evaluate LNM status of LUAD patients with the area under the receiver operating characteristic (ROC) curve (AUC) more than 0.7 in fivefold crossvalidation. The prognostic model performed well at 1, 3, 5, and 10 years in both the training cohort and internal test cohort. Univariate and multivariate Cox regression indicated the prognostic model could be used as an independent prognostic factor for patients with LUAD. Further analysis revealed correlations between the prognostic model and American Joint Committee on Cancer stage, T stage, N stage, and living status. The splicing network constructed of survival-related SFs and ASEs depicts regulatory relationships between them. Conclusion:  In summary, our study provides insight into LUAD researches and managements based on these AS biomarkers. Keywords:  Lung adenocarcinoma, Alternative splicing, Random forests, Splicing switch, Metastasis, Prognosis

*Correspondence: [email protected] 1 Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha 410011, Hunan, China Full list of author information is available at the end of the article

Background Lung cancer is the most common and deadliest cancer worldwide, in which non-small cell lung cancer (NSCLC) accounts for 85% of all cases [1, 2]. NSCLC can be mainly

© The Author(s) 2020. This article is licensed under a Creative Com