A Novel Graph Partitioning Criterion Based Short Text Clustering Method

A novel clustering method based on spectral clustering theory and spectral cut standard is proposed via analyzing the characteristics of short text and the defects of the existing clustering algorithms. First of all, a weighted undirected graph is created

PDF / 729,875 Bytes
11 Pages / 439.37 x 666.142 pts Page_size
89 Downloads / 273 Views

DOWNLOAD

REPORT

Abstract. A novel clustering method based on spectral clustering theory and spectral cut standard is proposed via analyzing the characteristics of short text and the defects of the existing clustering algorithms. First of all, a weighted undirected graph is created according to spectral clustering theory, similarity between node and node is calculated on graph, and a symmetrical documents similarity matrix is constructed, which provides all information for the clustering algorithm. Inspired by Greedy strategy, we utilize prim to develop PrimMAE algorithm for the purpose of partitioning graph into two parts, in which RMcut is termination condition of partitioning process, and then it is fed into CASC algorithm to cut the documents set iteratively. Ultimately, high quality clustering results demonstrate the effectiveness of the new clustering algorithm. Keywords: Short text Similarity matrix RMcut criterion Prim algorithm Clustering algorithm

1 Introduction With the increasing growth of the internet, social media based on it such as news title, micro-blog, instant message and so on has become increasingly popular, which produces a large number of information every day, and the key here is that most of the information is appeared in the form of short text. Different from traditional text, short text is usually of the following characteristics [1]: (1) Growing dynamically. That is to say, they have a rapid growth rate. (2) Text is short. Usually there are fewer words in short texts, so they are lack of sufﬁcient statistical information for effective similarity measure. (3) The number of short texts is huge. The existing text clustering methods cannot meet the need of theoretical analysis for the emerging wide range of short texts. Traditional document clustering methods cannot accomplish the task effectively, we need to research and design more suitable methods. Therefore how to improve the efﬁciency of clustering the mass of short text through information fusion has become the researching focus [2]. In recent years, clustering algorithms for short texts are constantly raised. Tang et al. [3] extended short text representation via a multi-language representation method by means of machine translation. Wang et al. [4] proposed an extension of VSM (vector space model) for communication short texts, called WR-KMeans. Yin and Wang [5] calculated the semantic distance between two word strings as balance of the © Springer International Publishing Switzerland 2016 D.-S. Huang et al. (Eds.): ICIC 2016, Part III, LNAI 9773, pp. 338–348, 2016. DOI: 10.1007/978-3-319-42297-8_32

A Novel Graph Partitioning Criterion

339

extent of word sequence alignment and the meaning matching between word strings. Peng et al. [6] explored the text characteristic of high dimensions and sparse space, proposed a novel text clustering algorithm based on semantic inner space model. Methods mentioned above are mainly focused on how to carry out feature representation and similarity calculation of short texts. However, the clustering perfo

Data Loading...

A Novel Graph Partitioning Criterion Based Short Text Clustering Method

Recommend Documents

Improvement of Short Text Clustering Based on Weighted Word Embeddings

Graph-based Clustering

Graph Partitioning

Deep Hierarchical Non-negative Matrix Factorization for Clustering Short Text

Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning

Text Clustering

A Novel MapReduce Based k-Means Clustering

A Chinese expert disambiguation method based on semi-supervised graph clustering

mPartition: A Model-Based Method for Partitioning Alignments

Rosen's partitioning method

GCNSP: A Novel Prediction Method of Self-Interacting Proteins Based on Graph Convolutional Networks

A Novel Clustering-Based Filter Pruning Method for Efficient Deep Neural Networks