Optimal subspace classification method for complex data

  • PDF / 513,916 Bytes
  • 9 Pages / 595.276 x 790.866 pts Page_size
  • 27 Downloads / 229 Views

DOWNLOAD

REPORT


ORIGINAL ARTICLE

Optimal subspace classification method for complex data Nan Li • Gong-De Guo • Li-Fei Chen Si Chen



Received: 31 July 2011 / Accepted: 23 January 2012 / Published online: 11 February 2012  Springer-Verlag 2012

Abstract KNNModel algorithm is an improved version for k-nearest neighbor method. However, it has the problem of high time complexity and lower performance when dealing with complex data. An optimal subspace classification method called IKNNModel is proposed in this paper by projecting different training samples onto their own optimal subspace and constructing the corresponding class cluster and pure cluster as the basis of classification. For datasets with complex structure, that is, the training samples from different categories are overlapped with one another on the original space or have a high dimensionality, the proposed method can construct the corresponding clusters for the overlapped samples on their own subspaces easily. Experimental results show that compared with KNNModel, the proposed method not only significantly improves the classification performance on datasets with complex structure, but also improves the efficiency of the classification. Keywords k-nearest neighbor  KNNModel  Subspace  Classification

1 Introduction Classification is a kind of supervised machine learning method as well as an important technique in data mining, N. Li  G.-D. Guo (&)  L.-F. Chen  S. Chen School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, China e-mail: [email protected] N. Li  G.-D. Guo  L.-F. Chen  S. Chen Key Laboratory of Network Security and Cryptography, Fujian Normal University, Fuzhou, China

which is crucial to many practical applications ranging from indexing text documents to medical diagnosis. Nowadays many classification methods, such as decision tree, Bayesian classifier, SVM [1–3] and example-based classifier [4], have been proposed. Among them, the k-nearest neighbor algorithm (KNN), proposed by Cover et al. [5], is regarded as one of the ten most classical algorithms in data mining as it is simple but can be widely applied and effective [6]. However, the parameter k which is the number of the neighbors is hard to select on KNN algorithm. What is worse, as a kind of lazy classifier, KNN doesn’t construct the explicit classification model for the training samples and has to keep all the training samples for classification. Moreover, KNN algorithm has slow speed for classification because when it classifies the testing samples, KNN algorithm has to compute the similarity between the testing example and all the training examples. Some improved algorithms [7–10] based on KNN have been proposed to overcome its shortcomings. Among them, KNNModel algorithm [11], proposed by Guo et al. [12], constructs a number of model clusters instead of all the training samples as the basis of the classification, which overcomes the above-mentioned shortcomings of KNN algorithm and has better experimental results on the application of automatic text categorization