Optimal subspace classification method for complex data

PDF / 513,916 Bytes
9 Pages / 595.276 x 790.866 pts Page_size
27 Downloads / 247 Views

ORIGINAL ARTICLE

Optimal subspace classification method for complex data Nan Li • Gong-De Guo • Li-Fei Chen Si Chen

•

Received: 31 July 2011 / Accepted: 23 January 2012 / Published online: 11 February 2012 Springer-Verlag 2012

Abstract KNNModel algorithm is an improved version for k-nearest neighbor method. However, it has the problem of high time complexity and lower performance when dealing with complex data. An optimal subspace classification method called IKNNModel is proposed in this paper by projecting different training samples onto their own optimal subspace and constructing the corresponding class cluster and pure cluster as the basis of classification. For datasets with complex structure, that is, the training samples from different categories are overlapped with one another on the original space or have a high dimensionality, the proposed method can construct the corresponding clusters for the overlapped samples on their own subspaces easily. Experimental results show that compared with KNNModel, the proposed method not only significantly improves the classification performance on datasets with complex structure, but also improves the efficiency of the classification. Keywords k-nearest neighbor KNNModel Subspace Classification

1 Introduction Classification is a kind of supervised machine learning method as well as an important technique in data mining, N. Li G.-D. Guo (&) L.-F. Chen S. Chen School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, China e-mail: [email protected] N. Li G.-D. Guo L.-F. Chen S. Chen Key Laboratory of Network Security and Cryptography, Fujian Normal University, Fuzhou, China

which is crucial to many practical applications ranging from indexing text documents to medical diagnosis. Nowadays many classification methods, such as decision tree, Bayesian classifier, SVM [1–3] and example-based classifier [4], have been proposed. Among them, the k-nearest neighbor algorithm (KNN), proposed by Cover et al. [5], is regarded as one of the ten most classical algorithms in data mining as it is simple but can be widely applied and effective [6]. However, the parameter k which is the number of the neighbors is hard to select on KNN algorithm. What is worse, as a kind of lazy classifier, KNN doesn’t construct the explicit classification model for the training samples and has to keep all the training samples for classification. Moreover, KNN algorithm has slow speed for classification because when it classifies the testing samples, KNN algorithm has to compute the similarity between the testing example and all the training examples. Some improved algorithms [7–10] based on KNN have been proposed to overcome its shortcomings. Among them, KNNModel algorithm [11], proposed by Guo et al. [12], constructs a number of model clusters instead of all the training samples as the basis of the classification, which overcomes the above-mentioned shortcomings of KNN algorithm and has better experimental results on the application of automatic text categorization

Data Loading...

Optimal subspace classification method for complex data

Recommend Documents

Optimal Classification

A practical method for well log data classification

Case Classification Processing and Analysis Method for Respiratory Belt Data

Optimal strategies for managing complex authentication systems

Mathematical Aspects of Tensor Subspace Method

Rough subspace-based clustering ensemble for categorical data

Two-Subspace Projection Method for Coherent Overdetermined Systems

Online Multi-objective Subspace Clustering for Streaming Data

An optimal extreme learning-based classification method for power quality events using fractional Fourier transform

What is the Optimal Attribution Method for Explainable Ophthalmic Disease Classification?

Imbalanced Data Classification Method Based on Clustering and Voting Mechanism

Principal component analysis-assisted selection of optimal denoising method for oil well transient data