Leveraging TCGA gene expression data to build predictive models for cancer drug response
- PDF / 598,741 Bytes
- 11 Pages / 595.276 x 793.701 pts Page_size
- 17 Downloads / 223 Views
RESEARCH
Open Access
Leveraging TCGA gene expression data to build predictive models for cancer drug response Evan A. Clayton1†, Toyya A. Pujol2†, John F. McDonald1 and Peng Qiu3* From The Sixth International Workshop on Computational Network Biology: Modeling, Analysis, and Control (CNB-MAC 2019) Niagara Falls, NY, USA. 07 September 2019
* Correspondence: peng.qiu@bme. gatech.edu † Evan A. Clayton and Toyya A. Pujol are Co-first authors. 3 Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 950 Atlantic Dr NW, 30332-0230, Atlanta, GA 404-385-1656, USA Full list of author information is available at the end of the article
Abstract Background: Machine learning has been utilized to predict cancer drug response from multi-omics data generated from sensitivities of cancer cell lines to different therapeutic compounds. Here, we build machine learning models using gene expression data from patients’ primary tumor tissues to predict whether a patient will respond positively or negatively to two chemotherapeutics: 5-Fluorouracil and Gemcitabine. Results: We focused on 5-Fluorouracil and Gemcitabine because based on our exclusion criteria, they provide the largest numbers of patients within TCGA. Normalized gene expression data were clustered and used as the input features for the study. We used matching clinical trial data to ascertain the response of these patients via multiple classification methods. Multiple clustering and classification methods were compared for prediction accuracy of drug response. Clara and random forest were found to be the best clustering and classification methods, respectively. The results show our models predict with up to 86% accuracy; despite the study’s limitation of sample size. We also found the genes most informative for predicting drug response were enriched in well-known cancer signaling pathways and highlighted their potential significance in chemotherapy prognosis. Conclusions: Primary tumor gene expression is a good predictor of cancer drug response. Investment in larger datasets containing both patient gene expression and drug response is needed to support future work of machine learning models. Ultimately, such predictive models may aid oncologists with making critical treatment decisions. Keywords: Personalized oncology, Machine learning, Drug response, Predictive models
© The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory r
Data Loading...