A Novel Metric for Redundant Gene Elimination Based on Discriminative Contribution

As a high dimensional problem, analysis of microarray data sets is a hard task, where many weakly relevant but redundant features hurt generalization performance of classifiers. There are previous works to handle this problem by using linear or nonlinear

PDF / 361,103 Bytes
12 Pages / 430 x 660 pts Page_size
51 Downloads / 206 Views

DOWNLOAD

REPORT

4

Institute of System Biology, Shanghai University, Shanghai 200444, China 2 School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China [email protected] 3 Harvard Medical School, Harvard University, Cambridge, Massachusetts 02140-0888 USA National Human Genome Research Institute National Institutes of Health (NIH) U.S., Department of Health and Human Services Bethesda, MD 20852 USA

Abstract. As a high dimensional problem, analysis of microarray data sets is a hard task, where many weakly relevant but redundant features hurt generalization performance of classiﬁers. There are previous works to handle this problem by using linear or nonlinear ﬁlters, but these ﬁlters do not consider discriminative contribution of each feature by utilizing the label information. Here we propose a novel metric based on discriminative contribution to perform redundant feature elimination. By the new metric, complementary features are likely to be reserved, which is beneﬁcial for the ﬁnal classiﬁcation. Experimental results on several microarray data sets show our proposed metric for redundant feature elimination based on discriminative contribution is better than the previous state-of-arts linear or nonlinear metrics on the problem of analysis of microarray data sets.

1

Introduction

The rapid advances in gene expression microarray technology enable simultaneously measuring the expression levels for thousands or tens of thousands of genes in a single experiment [1]. Analysis of microarray data presents unprecedented opportunities and challenges for data mining in areas such as gene clustering, class discovery, and sample classiﬁcation [2,3,4]. In sample classiﬁcation, a microarray data set is provided as a training set of labeled samples. The task is to build a classiﬁer that accurately predicts the classes of novel unlabeled samples. A typical data set has thousands of genes but only a small number of samples (often less than a hundred). The number of samples is likely to remain small at least for the near future due to the expense of collecting microarray samples [5]. The nature of relatively high dimensionality but small sample size in microarray data cause the known problem of ”curse of dimensionality”. Therefore, selecting a small number of discriminative genes from thousands of genes is essential for successful sample classiﬁcation.

Corresponding author.

I. M˘ andoiu, R. Sunderraman, and A. Zelikovsky (Eds.): ISBRA 2008, LNBI 4983, pp. 256–267, 2008. c Springer-Verlag Berlin Heidelberg 2008

A Novel Metric for Redundant Gene Elimination

257

Feature selection, a process of choosing a subset of features from the original ones, is frequently used as a preprocessing technique in data mining. It has been proved eﬀective in reducing dimensionality, improving mining eﬃciency, increasing mining accuracy, and enhancing result comprehensibility [6]. In the ﬁeld of bioinformatics, the most commonly used procedures of feature selection (gene selection) are based on a score which is calculated for all g

Data Loading...

A Novel Metric for Redundant Gene Elimination Based on Discriminative Contribution

Recommend Documents

DDNE: Discriminative Distance Metric Learning for Network Embedding

Image matching based on the adaptive redundant keypoint elimination method in the SIFT algorithm

A Discriminative Framework for Hashing

Discriminative Characteristics of Marginalised Novel Psychoactive Users: a Transnational Study

Cloud Allocation and Consolidation Based on a Scalability Metric

Discriminative metric learning for face verification using enhanced Siamese neural network

A Discriminative Model for Polyphonic Piano Transcription

Multi-Task Deep Metric Learning with Boundary Discriminative Information for Cross-Age Face Verification

Motion planning for redundant robotic manipulators using a novel multi-group particle swarm optimization

Range Sensor-Based Obstacle Avoidance of a Hyper-Redundant Robot

Discriminative Model for Identifying Motion Primitives Based on Virtual Reality-Based IADL

Contrast enhancement based on discriminative co-occurrence statistics