Simultaneous feature selection and clustering of micro-array and RNA-sequence gene expression data using multiobjective

PDF / 11,625,784 Bytes
23 Pages / 595.276 x 790.866 pts Page_size
15 Downloads / 356 Views

ORIGINAL ARTICLE

Simultaneous feature selection and clustering of micro‑array and RNA‑sequence gene expression data using multiobjective optimization Abhay Kumar Alok1 · Pooja Gupta2 · Sriparna Saha1 · Vineet Sharma2 Received: 22 June 2019 / Accepted: 2 May 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract In this paper, we have devised a multiobjective optimization solution framework for solving the problem of gene expression data clustering in reduced feature space. Here clustering problem is viewed from two different aspects: clustering of genes in reduced sample space or clustering of samples in reduced gene space. Three objective functions: two internal cluster validity indices and the count on the number of features are optimized simultaneously by a popular multiobjective simulated annealing based approach, namely AMOSA. Here, point symmetry based distance is used for the assignment of gene data points to different clusters. Seven publicly available benchmark gene expression data sets are used for experimental purpose. Both aspects of clustering in reduced feature space is demonstrated. The proposed gene expression clustering technique outperforms the existing nine clustering techniques. Apart from this, also some statistical and biological significant tests have been carried out to show that the proposed FSC-MOO technique is more statistically and biologically enriched Keywords Gene expression data clustering · Feature selection · Point symmetry based distance · Multiobjective optimization · Cluster validity index

1 Introduction A large matrix showing gene expression levels(rows) and the different experimental conditions(columns) represent Gene expression data. Clustering of gene expression data Electronic supplementary material The online version of this article (https://doi.org/10.1007/s13042-020-01139-x) contains supplementary material, which is available to authorized users. * Abhay Kumar Alok [email protected] Pooja Gupta [email protected] Sriparna Saha [email protected] Vineet Sharma [email protected] 1

Computer Science Engineering, Indian Institute of Technology, Patna, India

Computer Science Engineering, Krishna Institute of Engineering and Technology, AKTU, Ghaziabad, Lucknow, India

2

can be carried out in two different spaces: gene space or sample space [24, 25, 31, 32]. In [11, 23], it has been mentioned that the appropriate sample selection helps to get a low-level visual representation of gene behavior across the samples. This dimensionality reduction in sample space helps to effectively tackle the problem of determining a low dimensional embedding that provides a precise visual representation of gene-gene interactions. Inspired by this observation in [23], a feature selection technique is proposed to reduce the number of samples from a given gene expression data set. The identified co-expressed genes are highly symmetrical, overlapping, and high-dimensional in nature. Most of the single-objective based clustering techniques fail to evolve

Data Loading...

Simultaneous feature selection and clustering of micro-array and RNA-sequence gene expression data using multiobjective

Recommend Documents

Optimized gene selection and classification of cancer from microarray gene expression data using deep learning

The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

Semi-supervised clustering for gene-expression data in multiobjective optimization framework

Feature Selection for Clustering

Feature Selection and Classification for Microarray Data Using ACO-FLANN Framework

Applications of Emerging Patterns for Microarray Gene Expression Data Analysis

A Neuroevolutionary Approach to Feature Selection Using Multiobjective Evolutionary Algorithms

Principles and Applications of Microarray Gene Expression in Pancreatic Cancer

Attribute Selection and Classification of Prostate Cancer Gene Expression Data Using Artificial Neural Networks

Feature Selection for Data and Pattern Recognition

Data Generation Using Gene Expression Generator

Gene Expression Data Matrix