Short Text Classification Technology Based on KNN+Hierarchy SVM

A short text classification method based on combination of KNN and hierarchical SVM is proposed. First, the KNN algorithm is improved to get the K nearest neighbor class labels quickly, so as to effectively filter the candidate classes of documents. And t

PDF / 217,367 Bytes
7 Pages / 439.37 x 666.142 pts Page_size
10 Downloads / 179 Views

DOWNLOAD

REPORT

School of Computer and Software, Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing, China [email protected] 2 College of Information Engineering, Yangzhou University, Yangzhou, China

Abstract. A short text classiﬁcation method based on combination of KNN and hierarchical SVM is proposed. First, the KNN algorithm is improved to get the K nearest neighbor class labels quickly, so as to effectively ﬁlter the candidate classes of documents. And then classify them from top to bottom using a multi-class sparse hierarchical SVM classiﬁer. By this way, the document can be classiﬁed efﬁciently. Keywords: KNN

Hierarchical SVM Candidate classes Short text

1 Introduction The popularity of the use of Internet demands the technology of short text classiﬁcation to deal with the ubiquitous data, such as Internet news, blog and mail, etc. The technology, known as text mining, is that automatically extracts valuable information and knowledge from those data mentioned which has been mentioned above. According to the length of text, text mining can be divided into long text mining and short text mining, while this two text mining methods did not clearly distinguish in the early stages of this technology research [1]. With the rise of social media, mobile text messages [2], Tweet and microblogging and other short text are emerging uncontrollably. The growing number of users of these applications makes the size of short texts larger and larger. In addition, the short text in the search engine, automatic questioning and topic tracking and other ﬁelds play a critical role. By and large, short text mining is increasingly concerned by researchers [3]. The popular short text classiﬁcation algorithms include K Nearest Neighbor (KNN) algorithm and Support Vector Machines (SVM) algorithm. Speciﬁcally, KNN and SVM methods have a huge advantage on the recall rate and accuracy. Although KNN algorithm is simple in principle and its classiﬁcation efﬁciency is high enough, it is an instance-based statistical learning method which is not very accurate for classifying samples at class boundaries. The SVM classiﬁcation algorithm aims to maximize the distance between the classiﬁcation boundaries, so the classiﬁcation accuracy is relatively high. However, it also reduces to the process of training classiﬁer

© Springer Nature Singapore Pte Ltd. 2017 J.J. (Jong Hyuk) Park et al. (eds.), Advanced Multimedia and Ubiquitous Engineering, Lecture Notes in Electrical Engineering 448, DOI 10.1007/978-981-10-5041-1_100

634

C. Yin et al.

relatively slow. In short, the use of either of these two methods alone is difﬁcult to achieve the desired classiﬁcation efﬁciency and effectiveness. Therefore, by combining KNN and SVM algorithms, the researchers can not only improve the accuracy of classiﬁcation, but also improve the efﬁciency of classiﬁcation, which can automatically classify the mass documents to achieve better results. When the classiﬁcation structure of the document forms a hie

Data Loading...

Short Text Classification Technology Based on KNN+Hierarchy SVM

Recommend Documents

Entity-Based Short Text Classification Using Convolutional Neural Networks

Text Document Classification with PCA and One-Class SVM

Short-Text Feature Expansion and Classification Based on Non-negative Matrix Factorization

Improving Short Text Classification Through Global Augmentation Methods

Improvement of Short Text Clustering Based on Weighted Word Embeddings

Short Text Feature Extension Based on Improved Frequent Term Sets

Text Classification

Text Classification

A Genetic-Based SVM Approach for Quality Data Classification

The Influence of Text Length on Text Classification Model

Sentiment Mining Using SVM-Based Hybrid Classification Model

Apple Grading Method Based on GA-SVM