Design and analysis of a large-scale COVID-19 tweets dataset

  • PDF / 3,362,191 Bytes
  • 15 Pages / 595.224 x 790.955 pts Page_size
  • 59 Downloads / 162 Views

DOWNLOAD

REPORT


Design and analysis of a large-scale COVID-19 tweets dataset Rabindra Lamsal1 Accepted: 16 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract As of July 17, 2020, more than thirteen million people have been diagnosed with the Novel Coronavirus (COVID-19), and half a million people have already lost their lives due to this infectious disease. The World Health Organization declared the COVID-19 outbreak as a pandemic on March 11, 2020. Since then, social media platforms have experienced an exponential rise in the content related to the pandemic. In the past, Twitter data have been observed to be indispensable in the extraction of situational awareness information relating to any crisis. This paper presents COV19Tweets Dataset (Lamsal 2020a), a large-scale Twitter dataset with more than 310 million COVID-19 specific English language tweets and their sentiment scores. The dataset’s geo version, the GeoCOV19Tweets Dataset (Lamsal 2020b), is also presented. The paper discusses the datasets’ design in detail, and the tweets in both the datasets are analyzed. The datasets are released publicly, anticipating that they would contribute to a better understanding of spatial and temporal dimensions of the public discourse related to the ongoing pandemic. As per the stats, the datasets (Lamsal 2020a, 2020b) have been accessed over 74.5k times, collectively. Keywords Social computing · Crisis computing · Sentiment analysis · Network analysis · Twitter data

1 Introduction 1.1 Social media and crisis events During a crisis, whether natural or man-made, people tend to spend relatively more time on social media than the normal. As crisis unfolds, social media platforms such as Facebook and Twitter become an active source of information [20] because these platforms break the news faster than official news channels and emergency response agencies [23]. During such events, people usually make informal conversations by sharing their safety status, querying about their loved ones’ safety status, and reporting ground level scenarios of the event [11, 20]. This process of continuous creation of conversations on such public platforms leads to accumulating a large amount of socially generated data. The amount of data can range from hundreds This article belongs to the Topical Collection: Artificial Intelligence Applications for COVID-19, Detection, Control, Prediction, and Diagnosis  Rabindra Lamsal

[email protected] 1

School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi 110067, India

of thousands to millions [25]. With proper planning and implementation, social media data can be analyzed and processed to extract situational information that can be further used to derive actionable intelligence for an effective response to the crisis. The situational information can be extremely beneficial for the first responders and decisionmakers to develop strategies that would provide a more efficient response to the crisis. In recent times, the most used social media platf