Iktishaf: a Big Data Road-Traffic Event Detection Tool Using Twitter and Spark Machine Learning

  • PDF / 3,979,731 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 102 Downloads / 298 Views

DOWNLOAD

REPORT


Iktishaf: a Big Data Road-Traffic Event Detection Tool Using Twitter and Spark Machine Learning Ebtesam Alomari 1

&

Iyad Katib 1 & Rashid Mehmood 2

# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Road transportation is the backbone of modern economies despite costing annually millions of human deaths and injuries and trillions of dollars. Twitter is a powerful information source for transportation but major challenges in big data management and Twitter analytics need addressing. We propose Iktishaf, developed over Apache Spark, a big data tool for traffic-related event detection from Twitter data in Saudi Arabia. It uses three machine learning (ML) algorithms to build multiple classifiers to detect eight event types. The classifiers are validated using widely used criteria and against external sources. Iktishaf Stemmer improves text preprocessing, event detection and feature space. Using 2.5 million tweets, we detect events without prior knowledge including the KSA national day, a fire in Riyadh, rains in Makkah and Taif, and the inauguration of Al-Haramain train. We are not aware of any work, apart from ours, that uses big data technologies for event detection of road traffic events from tweets in Arabic. Iktishaf provides hybrid human-ML methods and is a prime example of bringing together AI theory, big data processing, and human cognition applied to a practical problem. Keywords Machine learning . Big data . Social media . Apache spark . Event detection . Arabic stemmer . Road traffic . Twitter

1 Introduction Road transportation is the backbone of modern economies, although annually costing 1.25 million deaths and 50 million human injuries globally. Moreover, traffic congestion is among the leading problems in modern cities with concerns including the cost of congestion (for US alone the cost is $305 billion). The aggravating traffic on the roads, bad weather, roadworks, and other uncertainties cause congestion, accidents and other damages to public health. These causes (events) must be detected to support timely interventions for

* Ebtesam Alomari [email protected] Iyad Katib [email protected] Rashid Mehmood [email protected] 1

Faculty of Computing and Information Technology, King AbdulAziz University, Jeddah, Saudi Arabia

2

High Performance Computing Center, King AbdulAziz University, Jeddah, Saudi Arabia

traffic planning and operations and reduction of adverse effects on public health and resources. Smart societies aim to develop novel solutions for transportation and other sectors by timely analysis of diverse data produced by smartphones, cameras, GPS, the Internet of Things (IoT), and social media. Twitter is among the most favored social media providing a virtually omnipresent source of information on public matters. People and organizations post tweets (short text messages) sharing news, events, status, etc., generating vast real-time data on various topics including transportation. Twitter is emerging as a powerful sensor for event detection [1], cong