A Novel Algorithm for Text Classification Based on KNN and Chaotic Binary Particle Swarm Optimization

The main problem of chinese text classification is the high dimensional feature space. A novel algorithm for text classification based on KNN and chaotic particle swarm optimization is proposed. The algorithm utilizes chaotic particle swarm algorithm to t

  • PDF / 1,776,169 Bytes
  • 9 Pages / 439.37 x 666.142 pts Page_size
  • 111 Downloads / 207 Views

DOWNLOAD

REPORT


A Novel Algorithm for Text Classification Based on KNN and Chaotic Binary Particle Swarm Optimization Hui Xu, Shoudong Lu and Shixiang Zhou

Abstract The main problem of chinese text classification is the high dimensional feature space. A novel algorithm for text classification based on KNN and chaotic particle swarm optimization is proposed. The algorithm utilizes chaotic particle swarm algorithm to traverse the feature space of training set and selects feature subspace, then utilizes KNN algorithm to classify text in feature subspace. In the particle swarm’s iterative process, chaotic map is used to guide swarms for chaotic search. It makes the algorithm out of local optimum, and expands the ability of finding global optimal solution. Experimental results show that the novel algorithm for chinese text classification is effective, the classification accuracy and recall rate are better than KNN algorithm. Keywords Binary particle swarm

 Chaos  KNN  Text classification

66.1 Introduction Text classification is a process of automatically classifying text content into one or more predefined categories. With rapid growth of online text message in Internet, text classification process and organization have become one of primary technologies in domain of processing and organization of text data. Automatic text

H. Xu (&)  S. Lu School of Information and Statistics, Guangxi University of Finance and Economics, Nanning, 530003 Guangxi, China e-mail: [email protected] S. Zhou College of Science, Shandong University of Technology, Zibo, Shandong, China

W. Lu et al. (eds.), Proceedings of the 2012 International Conference on Information Technology and Software Engineering, Lecture Notes in Electrical Engineering 211, DOI: 10.1007/978-3-642-34522-7_66,  Springer-Verlag Berlin Heidelberg 2013

619

620

H. Xu et al.

classification has been applied in web page classification, topic identification, realtime documents sorting, information retrieval, search engine, e-mail classification, and information filtering etc. In recent years, researchers proposed a variety of text classification algorithm [1] such as: Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Artificial Neural Networks, Naïve Bayes probabilistic Classifier, and Decision Trees. The major difficulty of text classification is the high dimension of feature space and the sparsity of text vector [2]. Determining how to reduce dimension without loss of classification performance is the most important issue in text classification. Therefore, this paper proposes a novel text classification algorithm, it uses binary chaotic particle swarm optimization (CBPSO) algorithm to select optimal feature items as feature subspace in training text set, and implements KNN classification.

66.2 Chinese Text Preprocessing The process of text classification includes text preprocessing, classifier design and evaluation. Since chinese text is natural language text that has almost no structure, it is difficult for computer to understand directly it. Thus, for chinese text classifi