Short Text Classification Technology Based on KNN+Hierarchy SVM

A short text classification method based on combination of KNN and hierarchical SVM is proposed. First, the KNN algorithm is improved to get the K nearest neighbor class labels quickly, so as to effectively filter the candidate classes of documents. And t

  • PDF / 217,367 Bytes
  • 7 Pages / 439.37 x 666.142 pts Page_size
  • 10 Downloads / 164 Views

DOWNLOAD

REPORT


School of Computer and Software, Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing, China [email protected] 2 College of Information Engineering, Yangzhou University, Yangzhou, China

Abstract. A short text classification method based on combination of KNN and hierarchical SVM is proposed. First, the KNN algorithm is improved to get the K nearest neighbor class labels quickly, so as to effectively filter the candidate classes of documents. And then classify them from top to bottom using a multi-class sparse hierarchical SVM classifier. By this way, the document can be classified efficiently. Keywords: KNN

 Hierarchical SVM  Candidate classes  Short text

1 Introduction The popularity of the use of Internet demands the technology of short text classification to deal with the ubiquitous data, such as Internet news, blog and mail, etc. The technology, known as text mining, is that automatically extracts valuable information and knowledge from those data mentioned which has been mentioned above. According to the length of text, text mining can be divided into long text mining and short text mining, while this two text mining methods did not clearly distinguish in the early stages of this technology research [1]. With the rise of social media, mobile text messages [2], Tweet and microblogging and other short text are emerging uncontrollably. The growing number of users of these applications makes the size of short texts larger and larger. In addition, the short text in the search engine, automatic questioning and topic tracking and other fields play a critical role. By and large, short text mining is increasingly concerned by researchers [3]. The popular short text classification algorithms include K Nearest Neighbor (KNN) algorithm and Support Vector Machines (SVM) algorithm. Specifically, KNN and SVM methods have a huge advantage on the recall rate and accuracy. Although KNN algorithm is simple in principle and its classification efficiency is high enough, it is an instance-based statistical learning method which is not very accurate for classifying samples at class boundaries. The SVM classification algorithm aims to maximize the distance between the classification boundaries, so the classification accuracy is relatively high. However, it also reduces to the process of training classifier

© Springer Nature Singapore Pte Ltd. 2017 J.J. (Jong Hyuk) Park et al. (eds.), Advanced Multimedia and Ubiquitous Engineering, Lecture Notes in Electrical Engineering 448, DOI 10.1007/978-981-10-5041-1_100

634

C. Yin et al.

relatively slow. In short, the use of either of these two methods alone is difficult to achieve the desired classification efficiency and effectiveness. Therefore, by combining KNN and SVM algorithms, the researchers can not only improve the accuracy of classification, but also improve the efficiency of classification, which can automatically classify the mass documents to achieve better results. When the classification structure of the document forms a hie