Performance Analysis of Clustering Algorithm in Sensing Microblog for Smart Cities
Smart city is an aspiration of the various stakeholders of the city. We strongly believe that social media can be one of the real-time data sources, which help stakeholder to realize this dream. In this paper, we have analyzed the real-time data provided
- PDF / 121,119 Bytes
- 9 Pages / 439.37 x 666.142 pts Page_size
- 105 Downloads / 176 Views
Abstract Smart city is an aspiration of the various stakeholders of the city. We strongly believe that social media can be one of the real-time data sources, which help stakeholder to realize this dream. In this paper, we have analyzed the real-time data provided by Twitter in order to empower citizens by keeping them updated about what is happening around the city. We have implemented various clustering algorithms like k-means, Hierarchical agglomerative, LDA topic modeling on Twitter stream and reported results with purity 0.476, normal mutual information (NMI) 0.3835, and F-measure 0.54. We conclude that HA-ward outperforms Kmeans and LDA substantially. We also conclude that results are not impressive and need to design separate feature based clustering algorithm. We have identified various tasks to mine microblog in the ambit of smart city such as event detection, geo-tagging, city clustering based upon the user activity on ground. Keywords Microblog analysis
Clustering Smart city Topic modeling
1 Introduction Today, smart city is one of the buzzwords in the world including developed and developing countries. The rationale behind the smart city is to provide state-of-the-art services to their citizens to enable them to take informed decision. The smart city is a multidisciplinary research area involving many engineering disciplines. We firmly believe that social media can be one of the real-time data Sandip Modha (&) Information Retrieval and Language Processing Lab, DA-IICT, Gandhinagar, India e-mail: [email protected] Khushbu Joshi Department of Electronics and Communication, LDRP-ITR, Gandhinagar, India e-mail: [email protected] © Springer Science+Business Media Singapore 2016 S.C. Satapathy et al. (eds.), Proceedings of the International Congress on Information and Communication Technology, Advances in Intelligent Systems and Computing 439, DOI 10.1007/978-981-10-0755-2_50
467
468
Sandip Modha and Khushbu Joshi
sources that might contain city’s dynamics. We have chosen Twitter as microblog for this experiment. Today, social media users post millions of messages called post (or tweets in case of Twitter) on microblogging about their personal lives, politics, sports event, controversial event, emergency such as earthquake, accident, fire, etc. It is noteworthy that some incidents get reported in social media prior to news TV channel. (e.g., Michael Jackson’s death) [1]. It is impossible to keep track of the entire post due to its massive volume. To tackle this issue, we have tried to cluster these tweets using existing clustering algorithm inspired by approach taken by news aggregating services like Google. Twitter is one of the popular microblogging social network websites having 302 million [2] active users (out of 500 million) posting 400 million multilingual tweets (or post, Twitter message) every day. Due to 140 characters limitation, tweets often contain noisy text, URLs, tags, and Twitter names [3]. Twitter user often use informal language or native language written in roman script, nonstan
Data Loading...