Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods

Dimensionality reduction can often improve the performance of the k-nearest neighbor classifier (kNN) for high-dimensional data sets, such as microarrays. The effect of the choice of dimensionality reduction method on the predictive performance of kNN for

PDF / 360,991 Bytes
10 Pages / 430 x 660 pts Page_size
62 Downloads / 209 Views

DOWNLOAD

REPORT

tract. Dimensionality reduction can often improve the performance of the k-nearest neighbor classiﬁer (kNN) for high-dimensional data sets, such as microarrays. The eﬀect of the choice of dimensionality reduction method on the predictive performance of kNN for classifying microarray data is an open issue, and four common dimensionality reduction methods, Principal Component Analysis (PCA), Random Projection (RP), Partial Least Squares (PLS) and Information Gain(IG), are compared on eight microarray data sets. It is observed that all dimensionality reduction methods result in more accurate classiﬁers than what is obtained from using the raw attributes. Furthermore, it is observed that both PCA and PLS reach their best accuracies with fewer components than the other two methods, and that RP needs far more components than the others to outperform kNN on the non-reduced dataset. None of the dimensionality reduction methods can be concluded to generally outperform the others, although PLS is shown to be superior on all four binary classiﬁcation tasks, but the main conclusion from the study is that the choice of dimensionality reduction method can be of major importance when classifying microarrays using kNN.

1

Introduction

Microarray gene-expression technology has spread across the research community with immense speed during the last decade [1]. Being able to eﬀectively learn from data generated through this technology is important for many reasons, including allowing for early accurate diagnoses which might lead to proper choice of treatments and therapies [2,3]. On the other hand, this type of highdimensional data, often involving thousands of attributes, creates challenges for many learning algorithms, including the well-known k-nearest neighbor classiﬁer (kNN) [4]. H. Yin et al. (Eds.): IDEAL 2007, LNCS 4881, pp. 800–809, 2007. c Springer-Verlag Berlin Heidelberg 2007

Classiﬁcation of Microarrays with kNN

801

The kNN has a very simple strategy as a learner: instead of generating an explicit model, it keeps all training instances. A classiﬁcation is made by measuring the distances from the test instance to all training instances, most commonly using the Euclidean distance. Finally, the majority class among the k nearest instances is assigned to the test instance. This simple form of kNN can however be both inefﬁcient and ineﬀective for high-dimensional data sets due to presence of irrelevant and redundant attributes. Therefore the classiﬁcation accuracy of kNN often decreases with an increase in dimensionality. One possible remedy to this problem that earlier has shown to be successful is to use dimensionality reduction [5]. The kNN has earlier been demonstrated to allow for successful classiﬁcation of microarrays [2] and it has also been shown that dimensionality reduction can further improve the performance of kNN for this task [5]. However, it is an open question if the choice of dimensionality reduction technique has any impact in the performance, and for this purpose, four commonly employed dimensionalit

Data Loading...

Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods

Recommend Documents

Dimensionality Reduction

Dimensionality Reduction

Dimensionality Reduction

Dimensionality Reduction

Dimensionality Reduction

Improved Blog Classification Using Multi Stage Dimensionality Reduction Technique

Optimal Dimensionality Reduction

Dimensionality Reduction with Unsupervised Nearest Neighbors

Nonlinear Dimensionality Reduction

Dimensionality Reduction of Bistable Biological Systems

Dimensionality Reduction and Sparse Representation

Dimensionality Reduction by Subsequence Pruning