NSLPCD: Topic based tweets clustering using Node significance based label propagation community detection algorithm

  • PDF / 3,510,236 Bytes
  • 37 Pages / 439.642 x 666.49 pts Page_size
  • 15 Downloads / 194 Views

DOWNLOAD

REPORT


NSLPCD: Topic based tweets clustering using Node significance based label propagation community detection algorithm Jagrati Singh1 · Anil Kumar Singh1

© Springer Nature Switzerland AG 2020

Abstract Social networks like Twitter, Facebook have recently become the most widely used communication platforms for people to propagate information rapidly. Fast diffusion of information creates accuracy and scalability issues towards topic detection. Most of the existing approaches can detect the most popular topics on a large scale. However, these approaches are not effective for faster detection. This article proposes a novel topic detection approach – Node Significance based Label Propagation Community Detection (NSLPCD) algorithm, which detects the topic faster without compromising accuracy. The proposed algorithm analyzes the frequency distribution of keywords in the collection of tweets and finds two types of keywords: topic-identifying and topic-describing keywords, which play an important role in topic detection. Based on these defined keywords, the keyword co-occurrence graph is built, and subsequently, the NSLPCD algorithm is applied to get topic clusters in the form of communities. The experimental results using the real data of Twitter, show that the proposed method is effective in quality as well as run-time performance as compared to other existing methods. Keywords Tweet clustering · Supervised and Unsupervised technique · Label propagation · Keyword co-occurrence · Topic modeling

1 Introduction The microblogging platform - Twitter has become the most popular communication channel to share information for users. Nearly 500 million tweets per day and 6000 tweets1 per

1 https://www.dsayce.com/social-media/tweets-day/(31October2019)

 Jagrati Singh

[email protected] Anil Kumar Singh [email protected] 1

CSED, Motilal Nehru National Institute of Technology Prayagraj, Prayagraj, India

J. Singh, A.K. Singh

second are generated by 330 million active users2 . Twitter has various features that make it better from news media websites, blogs, or other traditional information channels like television and newspapers. Users in real-time generate tweets. Due to the limitation of content size (280 characters for a tweet), twitter is called microblog rather than a blog (no restriction on content size). With the brevity guaranteed by a 280-character-tweet limit and the popularity of mobile applications, people do tweet and retweet instantly. Thus, many times Twitter reports the news first and later captured by traditional news media agencies. Tweets have extensive coverage of real-world events that cover every aspect of daily life. Tweets are user generated content. So, Users can report news related to any event happening around them. Due to the rapid and extensive information diffusion, researchers are interested in analyzing the information to gain knowledge of current trending events. In particular, various research studies are being followed to answer the question, “What is the trending topic right now?”. The proces