A snapshot neural ensemble method for cancer-type prediction based on copy number variations

PDF / 2,722,840 Bytes
19 Pages / 595.276 x 790.866 pts Page_size
37 Downloads / 258 Views

(0123456789().,-volV)(0123456789(). ,- volV)

ARTIFICIAL INTELLIGENCE INTERNATIONAL CONFERENCE - A2IC 2018

A snapshot neural ensemble method for cancer-type prediction based on copy number variations Md. Rezaul Karim1,2

•

Ashiqur Rahman2 • Joa˜o Bosco Jares1 • Stefan Decker1,2 • Oya Beyan1,2

Received: 18 January 2019 / Accepted: 22 November 2019 Ó The Author(s) 2019

Abstract An accurate diagnosis and prognosis for cancer are specific to patients with particular cancer types and molecular traits, which needs to address carefully. The discovery of important biomarkers is becoming an important step toward understanding the molecular mechanisms of carcinogenesis in which genomics data and clinical outcomes need to be analyzed before making any clinical decision. Copy number variations (CNVs) are found to be associated with the risk of individual cancers and hence can be used to reveal genetic predispositions before cancer develops. In this paper, we collect the CNVs data about 8000 cancer patients covering 14 different cancer types from The Cancer Genome Atlas. Then, two different sparse representations of CNVs based on 578 oncogenes and 20,308 protein-coding genes, including genomic deletions and duplication across the samples, are prepared. Then, we train Conv-LSTM and convolutional autoencoder (CAE) networks using both representations and create snapshot models. While the Conv-LSTM can capture locally and globally important features, CAE can utilize unsupervised pretraining to initialize the weights in the subsequent convolutional layers against the sparsity. Model averaging ensemble (MAE) is then applied to combine the snapshot models in order to make a single prediction. Finally, we identify most significant CNVs biomarkers using guided-gradient class activation map plus (GradCAM??) and rank top genes for different cancer types. Results covering several experiments show fairly high prediction accuracies for the majority of cancer types. In particular, using protein-coding genes, Conv-LSTM and CAE networks can predict cancer types correctly at least 72.96% and 76.77% of the cases, respectively. Contrarily, using oncogenes gives moderately higher accuracies of 74.25% and 78.32%, whereas the snapshot model based on MAE shows overall 2.5% of accuracy improvement. Keywords Cancer prediction Copy number variation Conv-LSTM network Convolutional autoencoder Interpretability Snapshot ensemble

1 Introduction Cancer results from highly expressed genes due to mutations or alterations in gene regulations that control cell division and cell growth. In such cases, a set of genes called oncogene contribute to conversion of normal cells into a cancerous cells. The change in the structure of occurring genetic aberrations, such as somatic mutations,

& Md. Rezaul Karim [email protected] 1

Fraunhofer Institute for Applied Information Technology FIT, Schloss Birlinghoven, Sankt Augustin, Germany

2

RWTH Aachen University, Aachen, Germany

copy numbers (CNs), profiles, and different epigenetic alteration

Data Loading...

A snapshot neural ensemble method for cancer-type prediction based on copy number variations

Recommend Documents

A study of normal copy number variations in Israeli population

Copy Number Variant (CNV)

A machine learning framework for genotyping the structural variations with copy number variant

A Transcriptional Study of Oncogenes and Tumor Suppressors Altered by Copy Number Variations in Ovarian Cancer

A Stock Prediction Method Based on LSTM

Comparative analyses of copy number variations between Bos taurus and Bos indicus

Copy Number Variants Methods and Protocols

Fault prediction method for nuclear power machinery based on Bayesian PPCA recurrent neural network model

N-semble: neural network based ensemble approach

A learning-based method for drug-target interaction prediction based on feature representation learning and deep neural

The Future of Prenatal Cytogenetics: From Copy Number Variations to Non-invasive Prenatal Testing

A Datacube Reconstruction Method for Snapshot Image Mapping Spectrometer