Scalable Online Analytics on Cloud Infrastructures

The need for low latency analysis of high velocity real time continuous data streams has led to the emergence of Stream Processing Systems (SPSs). Contemporary SPSs allow a stream processing application to be hosted on Cloud infrastructures and dynamicall

  • PDF / 1,582,873 Bytes
  • 10 Pages / 439.37 x 666.142 pts Page_size
  • 87 Downloads / 229 Views

DOWNLOAD

REPORT


2

Department of CSE & IT, The NorthCap University, Gurugram, India [email protected] School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India [email protected], [email protected]

Abstract. The need for low latency analysis of high velocity real time continuous data streams has led to the emergence of Stream Processing Systems (SPSs). Contemporary SPSs allow a stream processing application to be hosted on Cloud infrastructures and dynamically scaled so as to adapt to the fluctuating data rates. However, the run time scalability incorporated in these SPSs are in their early adaptations and are based on simple local/global threshold based controls. This work studies the issues with the local and global auto scaling techniques that may lead to performance inefficiencies in real time traffic analysis on Cloud platforms and presents an efficient hybrid auto scaling strategy StreamScale which addresses the identified issues. The proposed StreamScale auto-scaling algorithm accounts for the gaps in the local/global scaling approaches and effectively identifies (de)parallelization opportunities in stream processing applications for maintaining QoS at reduced costs. Simulation based experimental evaluation on representative stream application topologies indicate that the proposed StreamScale auto-scaling algorithm exhibits better performance in comparison to both local and global auto-scaling approaches. Keywords: Stream Processing Systems (SPS)  Scalability  Online analytics  Cloud computing  Internet of Things (IoT)

1 Introduction The emergence of IoT, involving a network of virtually ubiquitous sensors, has led to a scenario where huge amounts of continuous data streams are generated that needs to be processed in real time. As a result, data stream processing has recently surfaced as a new computational paradigm. It involves real time analysis of ‘data in motion’ so as to extract actionable information and intelligence for productive decision making. Examples of streaming applications may be found in diverse domains e.g. real time traffic analysis for congestion predictions, security intelligence for fraud detection, QoS monitoring for end user services and continuous trend analysis in social networking models. Over the past few years, a number of SPSs have been developed to support continuous analytics of data streams. Examples include commercial solutions e.g. StreamBase [12], InfoSphere [3]; open source solutions e.g. Apache Storm [4], S4 [8] and academic solutions e.g. STREAM [2] and Borealis [1]. © Springer Nature Singapore Pte Ltd. 2017 M. Singh et al. (Eds.): ICACDS 2016, CCIS 721, pp. 399–408, 2017. DOI: 10.1007/978-981-10-5427-3_43

400

J. Sahni and D.P. Vidyarthi

Streaming applications are structured as directed graphs where vertices are the processing elements (PEs) and edges are the data streams. A classical SPS [1, 2] allows these applications to be executed on fixed size clusters with PEs distributed among different nodes. The cluster size is generally chosen to mee