Simultaneous feature selection and clustering of micro-array and RNA-sequence gene expression data using multiobjective
- PDF / 11,625,784 Bytes
- 23 Pages / 595.276 x 790.866 pts Page_size
- 15 Downloads / 196 Views
ORIGINAL ARTICLE
Simultaneous feature selection and clustering of micro‑array and RNA‑sequence gene expression data using multiobjective optimization Abhay Kumar Alok1 · Pooja Gupta2 · Sriparna Saha1 · Vineet Sharma2 Received: 22 June 2019 / Accepted: 2 May 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract In this paper, we have devised a multiobjective optimization solution framework for solving the problem of gene expression data clustering in reduced feature space. Here clustering problem is viewed from two different aspects: clustering of genes in reduced sample space or clustering of samples in reduced gene space. Three objective functions: two internal cluster validity indices and the count on the number of features are optimized simultaneously by a popular multiobjective simulated annealing based approach, namely AMOSA. Here, point symmetry based distance is used for the assignment of gene data points to different clusters. Seven publicly available benchmark gene expression data sets are used for experimental purpose. Both aspects of clustering in reduced feature space is demonstrated. The proposed gene expression clustering technique outperforms the existing nine clustering techniques. Apart from this, also some statistical and biological significant tests have been carried out to show that the proposed FSC-MOO technique is more statistically and biologically enriched Keywords Gene expression data clustering · Feature selection · Point symmetry based distance · Multiobjective optimization · Cluster validity index
1 Introduction A large matrix showing gene expression levels(rows) and the different experimental conditions(columns) represent Gene expression data. Clustering of gene expression data Electronic supplementary material The online version of this article (https://doi.org/10.1007/s13042-020-01139-x) contains supplementary material, which is available to authorized users. * Abhay Kumar Alok [email protected] Pooja Gupta [email protected] Sriparna Saha [email protected] Vineet Sharma [email protected] 1
Computer Science Engineering, Indian Institute of Technology, Patna, India
Computer Science Engineering, Krishna Institute of Engineering and Technology, AKTU, Ghaziabad, Lucknow, India
2
can be carried out in two different spaces: gene space or sample space [24, 25, 31, 32]. In [11, 23], it has been mentioned that the appropriate sample selection helps to get a low-level visual representation of gene behavior across the samples. This dimensionality reduction in sample space helps to effectively tackle the problem of determining a low dimensional embedding that provides a precise visual representation of gene-gene interactions. Inspired by this observation in [23], a feature selection technique is proposed to reduce the number of samples from a given gene expression data set. The identified co-expressed genes are highly symmetrical, overlapping, and high-dimensional in nature. Most of the single-objective based clustering techniques fail to evolve
Data Loading...