Gene Expression-Based Supervised Classification Models for Discriminating Early- and Late-Stage Prostate Cancer

  • PDF / 649,698 Bytes
  • 25 Pages / 595.276 x 790.866 pts Page_size
  • 50 Downloads / 179 Views

DOWNLOAD

REPORT


RESEARCH ARTICLE

Gene Expression-Based Supervised Classification Models for Discriminating Early- and Late-Stage Prostate Cancer Rajesh Kumar1 • Prateek Bhanti2 • Avinash Marwal3 • R. K. Gaur4

Received: 22 August 2018 / Revised: 30 June 2019 / Accepted: 16 July 2019  The National Academy of Sciences, India 2019

Abstract Prostate cancer is one of the prominent types of cancer affecting the human male population throughout the world. Detecting cancer in the early-stage is a crucial factor in the effective treatment of the disease. Machine learning is a type of algorithm that can learn and predict from a given dataset without being manually programmed. Machine learning can be useful with gene expression data to discriminate cancer stage rather than relying on histology of tissue and various other diagnostic methods used in prostate cancer detection. In this study, the authors have developed a supervised classifier for detecting early- and late-stage prostate cancer using RNA sequencing-based

gene expression data collected from The Cancer Genome Atlas. Supervised learning algorithms Naive Bayes, stochastic gradient descent, J48, and Random Forest, Multilayer Perceptron were employed with 276 most informative subset of features extracted from gene expression data. Accuracies of these developed models were evaluated after tenfold cross-validation. Among all, the trained classifiers stochastic gradient descent-based classifier performed best with accuracy 86.91%, sensitivity 86.9% and area under receiver operating curve 0.656. Gene Ontology and KEGG pathway enrichment analysis of these 276 gene features were also performed to functionally categorize these genes.

Significance Statement In this work, the authors have used TCGA gene expression data and machine learning techniques to classify whether prostate cancer is in early- or late-stage. Using TCGA gene expression data the authors identified the most informative subset of gene features and used expression of these gene features to classify prostate cancer stage. The authors have shown that machine learningbased prediction methods can be substitute for histology-based cancer-stage determination.

Keywords Prostate cancer  Early–late-stage classification  Machine learning  WEKA

& R. K. Gaur [email protected] 1

Department of Biosciences, School of Sciences, Mody University of Science and Technology, Lakshmangarh, Sikar, Rajasthan 332311, India

2

Department of Computer Science and Engineering, School of Engineering and Technology, Mody University of Science and Technology, Lakshmangarh, Sikar, Rajasthan 332311, India

3

Department of Biotechnology, Mohanlal Sukhadia University, Vigyan Bhawan – Block B, Main Campus, Udaipur, Rajasthan 313001, India

4

Department of Biotechnology, Deen Dayal Upadhyay Gorakhpur University, Gorakhpur, Uttar Pradesh 273009, India

Introduction Prostate cancer is one of the leading malignancies in numbers after lung cancer to affect the male population worldwide [1]. Over a million people are affected by this disease worldwide