Topic representation model based on microblogging behavior analysis

PDF / 752,396 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
19 Downloads / 291 Views

Topic representation model based on microblogging behavior analysis Weihong Han 1 & Zhihong Tian 1

2

1

& Zizhong Huang & Shudong Li & Yan Jia

3

Received: 12 June 2019 / Revised: 30 April 2020 / Accepted: 4 May 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

With the development of microblogging, it has become an important way for people to obtain information, express opinions, and make suggestions. Identifying new topics quickly and accurately from the massive microblogging data plays a crucial role for recommending information and controlling public opinion. The topic representation model provides a basis for topic detection. In this paper, we propose a topic representation model based on user behavior analysis, i.e., microblogging behavior analysis-latent Dirichlet allocation (MBA-LDA) model, for microblogging datasets. Topic-word distribution is acquired by the LDA model which considers information on user behaviors (such as posting, forwarding and commenting) and word distribution among documents within one topic and among different topics. The model also re-assesses the importance of words in topic representation. The basic idea is that the distribution of words within a topic or among different topics has a great influence on the selection of topic expression words. If a word is evenly distributed among all documents of a certain topic, it indicates that the word is the common word of all documents in the topic, and it is more suitable to represent this topic. If a word is more evenly distributed among various topics, it indicates that the word is the common word of all topics, and it can’t achieve the purpose of distinguishing topics, so it is less suitable to represent any topic. By experiments with Sina Microblogging’s actual data set, the topic model based on the MBA-LDA algorithm makes the representative words more important and increases the differentiation of topic words, which effectively improves the accuracy of subsequent topic detection and evolutionary analysis. Keywords Topic representation model . Behavior analysis . Word distribution . LDA model . Topic detection

* Zhihong Tian [email protected] Extended author information available on the last page of the article

World Wide Web

1 Introduction After the launch of Twitter, a microblogging service website, microblogging has gained popularity worldwide because of its unique information-sharing mechanism [1–3]. In China, the wide adoption of platforms such as Sina Microblogging and Tencent Microblogging has made microblogging an inseparable part in people’s daily life [4]. Although microblogging is very convenient, it produces a massive microblogging-based data set with complex and diverse contents [5, 6]. Thus, identifying new topics quickly and accurately from such a data set plays an important role in facilitating information recommendation services and control of public opinions. For example, from a sociological point of view, topic discovery helps to reveal the formation process and evolution

Data Loading...

Topic representation model based on microblogging behavior analysis

Recommend Documents

A Hot Topic Detection Approach on Chinese Microblogging

A Semantic Representation of Micro-blog Short Text Based on Topic Model

Inter-battery Topic Representation Learning

Research on Hot Topic Discovery Technology of Micro-blog Based on Biterm Topic Model

Blog Topic Diffusion Prediction Model Based on Link Information Flow

A Topic Evolution Model Based on Microblog Network

Topic Information Collection Based on the Hidden Markov Model

A User Group Classification Model Based on Sentiment Analysis Under Microblog Hot Topic

Image Representation and Recognition Based on Directed Complex Network Model

Topic Logistics Based on Node Resource Status

Understanding Topic Influence Based on Module Network

Mashup tag completion with attention-based topic model