A Classification Model to Analyze the Spread and Emerging Trends of the Zika Virus in Twitter
The Zika disease is a 2015–16 virus epidemic and continues to be a global health issue. The recent trend in sharing critical information on social networks such as Twitter has been a motivation for us to propose a classification model that classifies twee
- PDF / 344,156 Bytes
- 8 Pages / 439.37 x 666.142 pts Page_size
- 112 Downloads / 198 Views
Abstract The Zika disease is a 2015–16 virus epidemic and continues to be a global health issue. The recent trend in sharing critical information on social networks such as Twitter has been a motivation for us to propose a classification model that classifies tweets related to Zika and thus enables us to extract helpful insights into the community. In this paper, we try to explain the process of data collection from Twitter, the preprocessing of the data, building a model to fit the data, comparing the accuracy of support vector machines and Naïve Bayes algorithm for text classification and state the reason for the superiority of support vector machine over Naïve Bayes algorithm. Useful analytical tools such as word clouds are also presented in this research work to provide a more sophisticated method to retrieve community support from social networks such as Twitter.
⋅
Keywords Zika Twitter analysis machines Naïve Bayes algorithm
⋅
⋅
Twitter classification
⋅
Support vector
1 Introduction The Zika virus is responsible for causing the Zika disease and is primarily carried by the Aedes species mosquito. The incubation period of the disease lasts for at most a week and has symptoms such as fever, rashes, headache, and conjunctivitis. Zika virus was declared as a Public Health Emergency of International Concern (PHEIC) by World Health Organization (WHO) on February 1, 2016. At present,
B.K. Tripathy (✉) ⋅ S. Thakur ⋅ R. Chowdhury School of Computing Science and Engineering, VIT University, Vellore, India e-mail: [email protected] S. Thakur e-mail: [email protected] R. Chowdhury e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2017 H.S. Behera and D.P. Mohapatra (eds.), Computational Intelligence in Data Mining, Advances in Intelligent Systems and Computing 556, DOI 10.1007/978-981-10-3874-7_61
643
644
B.K. Tripathy et al.
there are no cures such as vaccines or any other form of treatment for this disease and thus makes it a serious global health issue. Social networking such as Twitter and Facebook has often been treated as useful sources of information for community support on social outbreaks, especially on the global spectrum [9]. Twitter is a popular microblogging Web site where users interact socially by posting messages or the so-called ‘tweets’ on the Twitter platform. Twitter data have previously been used for various data analysis such as sentiment analysis [3, 8] and event detection and can be easily accessed by the publicly available Twitter API (application program interface). Twitter is highly popular in mobile application throughout the world and the users can post tweets that can be considered as precise sources of information as they have a 140-character limit [4]. Moreover, there are many verified accounts of reputed people, organizations, and communities and thus add more credibility to the tweets. Preprocessing of the tweets: The Twitter Streaming API was used to collect the most recent tweets. The tweets collected by the API are then preproces
Data Loading...