Transcription factor expression as a predictor of colon cancer prognosis: a machine learning practice

  • PDF / 2,167,220 Bytes
  • 10 Pages / 595.276 x 790.866 pts Page_size
  • 72 Downloads / 176 Views

DOWNLOAD

REPORT


RESEARCH

Open Access

Transcription factor expression as a predictor of colon cancer prognosis: a machine learning practice Jiannan Liu1†, Chuanpeng Dong1,2†, Guanglong Jiang1,2,3, Xiaoyu Lu1,2, Yunlong Liu2,3 and Huanmei Wu1,4* From The International Conference on Intelligent Biology and Medicine (ICIBM) 2019 Columbus, OH, USA. 9-11 June 2019

Abstract Background: Colon cancer is one of the leading causes of cancer deaths in the USA and around the world. Molecular level characters, such as gene expression levels and mutations, may provide profound information for precision treatment apart from pathological indicators. Transcription factors function as critical regulators in all aspects of cell life, but transcription factors-based biomarkers for colon cancer prognosis were still rare and necessary. Methods: We implemented an innovative process to select the transcription factors variables and evaluate the prognostic prediction power by combining the Cox PH model with the random forest algorithm. We picked five top-ranked transcription factors and built a prediction model by using Cox PH regression. Using Kaplan-Meier analysis, we validated our predictive model on four independent publicly available datasets (GSE39582, GSE17536, GSE37892, and GSE17537) from the GEO database, consisting of 925 colon cancer patients. Results: A five-transcription-factors based predictive model for colon cancer prognosis has been developed by using TCGA colon cancer patient data. Five transcription factors identified for the predictive model is HOXC9, ZNF556, HEYL, HOXC4 and HOXC6. The prediction power of the model is validated with four GEO datasets consisting of 1584 patient samples. Kaplan-Meier curve and log-rank tests were conducted on both training and validation datasets, the difference of overall survival time between predicted low and high-risk groups can be clearly observed. Gene set enrichment analysis was performed to further investigate the difference between low and high-risk groups in the gene pathway level. The biological meaning was interpreted. Overall, our results prove our prediction model has a strong prediction power on colon cancer prognosis. (Continued on next page)

* Correspondence: [email protected] † Jiannan Liu and Chuanpeng Dong contributed equally to this work. 1 Depart of BioHealth Informatics, School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA 4 Temple University College of Public Health, Philadelphia, PA, USA Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creativ