Cancer molecular subtype classification from hypervolume-based discrete evolutionary optimization

  • PDF / 1,895,337 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 43 Downloads / 183 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

S.I.: 2018 INDIA INTL. CONGRESS ON COMPUTATIONAL INTELLIGENCE

Cancer molecular subtype classification from hypervolume-based discrete evolutionary optimization Yunhe Wang1 • Shaochuan Li1 • Lei Wang1 • Zhiqiang Ma1 • Xiangtao Li2 Received: 18 December 2019 / Accepted: 6 March 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract High dimensionality and sample imbalance of gene expression data promote the development of effective algorithms for classifying gene expression data. To improve the ability to distinguish different subtypes of gene expression data, we devise a hypervolume-based discrete evolutionary optimization algorithm (HYBDEOA) in this paper. Four objectives, namely the number of genes, the accuracy, the relevance, and the redundancy, are optimized simultaneously to guide the evolution. Firstly, binary encoding is used to choose some features, projecting data onto different subspaces. After that, a discrete neighborhood operation is conducted to generate a new binary-mapped population. Combining the new population with the current population, we employ the hypervolume-based mechanism to select the Pareto solutions. Finally, a discrete mutation method is proposed to find promising solutions in the binary search space. To demonstrate the performance of HYBDEOA, we apply HYBDEOA to 55 synthetic datasets and 35 cancer gene expression datasets. Extensive experiments are also conducted to reveal the effectiveness and efficiency of HYBDEOA. The experimental results demonstrate that our proposed method is a parameter-less and robust algorithm, which can group gene expression data with a finer and more informative classification. Keywords Classification  Multiobjective optimization  Animal migration optimization algorithm  Gene expression data

1 Introduction DNA microarray technology has shown great potential in cancer diagnosis and classification [1, 2]. It is used to identify up-regulated or down-regulated genes that play an important role in specific cancers. However, such an approach is expensive, time-consuming, and unpractical on clinical applications for every patient. Moreover, it is well known that it is a critical step to group several genes into Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00521-020-04846-2) contains supplementary material, which is available to authorized users. & Xiangtao Li [email protected] Yunhe Wang [email protected] 1

School of Information Science and Technology, Northeast Normal University, Changchun 130117, China

2

School of Artificial Intelligence, Jilin University, Changchun 130012, China

different subtypes when analyzing the expression data of genes. Therefore, a plethora of approaches have been proposed for identifying cancer molecular subtypes. In [3], a novel analysis procedure was investigated to classify human tumor samples using microarray gene expressions. It reduces the feature dimension utilizing partial least squares. In [4], a new t