Semi-supervised clustering for gene-expression data in multiobjective optimization framework

PDF / 4,097,999 Bytes
19 Pages / 595.276 x 790.866 pts Page_size
14 Downloads / 241 Views

ORIGINAL ARTICLE

Semi-supervised clustering for gene-expression data in multiobjective optimization framework Abhay Kumar Alok • Sriparna Saha Asif Ekbal

•

Received: 10 June 2014 / Accepted: 18 January 2015 Ó Springer-Verlag Berlin Heidelberg 2015

Abstract Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the complexity of biological networks it is difficult to study the resulting mass of data which often consists of millions of measurements. In order to reveal natural structures and to identify interesting patterns from the given gene expression data set, clustering techniques are applied. Semi-supervised classification is a new direction of machine learning. It requires huge unlabeled data and a few labeled data. Semi-supervised classification in general performs better than unsupervised classification. But to the best of our knowledge there are no works for solving gene expression data clustering problem using semi-supervised classification techniques. In the current paper we have made an attempt to solve the gene expression data clustering problem using a multiobjective optimization based semi-supervised classification technique with the aim to attain good quality partitions by using few labeled data. In order to generate the labeled data, initially Fuzzy C-means clustering technique is applied. In order to automatically determine the partitioning, multiple cluster centers corresponding to a cluster are encoded in the form of a string. In order to compute the quality of the obtained partitioning, values of five objective functions are computed. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on five publicly A. K. Alok (&) S. Saha A. Ekbal Computer Science Engineering, Indian Institute of Technology, Patna, India e-mail: [email protected] S. Saha e-mail: [email protected] A. Ekbal e-mail: [email protected]

available benchmark gene expression data sets. Comparison results with the existing techniques for gene expression data clustering prove that the proposed method is the most effective one. Statistical and biological significance tests have also been carried out. Keywords Gene expression data clustering Semisupervised classification Multiobjective optimization Cluster validity index AMOSA

1 Introduction Due to invention of DNA (Deoxyribonucleic acid) microarray technology, it has become feasible to examine the expression level of thousands of genes at a time during their different ongoing biological processes and across collection of related samples. Different application areas of microarray technology are gene expression profiling, medical diagnosis, bio-medicine [1, 22, 39]. Usually, during the biological experiment, and at different time points, gene expression values are measured. A microarray gene expression data structure is defined as 2D matrix A ¼ ½fij of size c t, where c represents a gene and t represents a time point. Each element fij tells abou

Data Loading...

Semi-supervised clustering for gene-expression data in multiobjective optimization framework

Recommend Documents

An Interactive Framework for Offline Data-Driven Multiobjective Optimization

Fuzzy c-means clustering-based mating restriction for multiobjective optimization

Methods for Multiobjective Bilevel Optimization

A Distributed Framework for Online Stream Data Clustering

Nonlinear Multiobjective Optimization

Data clustering using multivariant optimization algorithm

Adaptive Scalarization Methods in Multiobjective Optimization

Clustering of Quantitative Survey Data: A Subsystem of EDM Framework

Big Data Clustering Using MapReduce Framework: A Review

Evolutionary Multiobjective Optimization Theoretical Advances an

Sequential Approximate Multiobjective Optimization Using Computational Intelligence

Genetic Algorithms and Fuzzy Multiobjective Optimization