Optimized distributed large-scale analytics over decentralized data sources with imperfect communication

PDF / 1,977,933 Bytes
19 Pages / 439.37 x 666.142 pts Page_size
95 Downloads / 234 Views

Optimized distributed large‑scale analytics over decentralized data sources with imperfect communication Reza Shahbazian1 · Francesca Guerriero2

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Tremendous amounts of data are generated by sensors and connected devices with high velocity in a variety of forms and large volumes. These characteristics, defined as big data, need new models and methods to be processed in near real-time. The nature of decentralized large-scale data sources requires distributed algorithms in which it is assumed that the data sources are capable of processing their own data and collaborating with neighbor sources. The network objective is to make an optimal decision, while the data are processed in a distributed manner. New technologies, like next generation of wireless communication and 5G, introduce practical issues such as imperfect communication that should be addressed. In this paper, we study a generalized form of distributed algorithms for decision-making over decentralized data sources. We propose an optimal algorithm that uses optimal weighting to combine the resource of neighbors. We define an optimization problem and find the solution by applying the proposed algorithm. We evaluate the performance of the developed algorithm by using both mathematical methods and computer simulations. We introduce the conditions in which the convergence of proposed algorithm is guaranteed and prove that the network error decreases considerably in comparison with some of the known modern methods. Keywords Big data · Large scale · Optimization · Distributed · Imperfect communication

* Reza Shahbazian [email protected] 1

Department of Mathematics and Computer Science, University of Calabria (UniCal), 87036 Rende, CS, Italy

2

Department of Mechanical, Energy and Management Engineering, University of Calabria (UniCal), 87036 Rende, CS, Italy

13

Vol.:(0123456789)

R. Shahbazian, F. Guerriero

1 Introduction The data sets with size or type beyond the ability of traditional databases to capture, manage and process with acceptable latency are called big data [1]. Big data normally has some known characteristics including high volume, high velocity, or high variety, although nowadays other “V” words such as value, veracity, viscosity, virality and visualization are introduced as characteristics of big data [2]. Some of the known characteristics of big data are depicted in Fig. 1. Big data analytics (BDA) enables a tremendous potential in terms of business values in a variety of fields including health care [3], transportation [4], advertising [5], energy management [6] and financial services [7]. Nowadays, new generations of big data, such as multimedia big data, are introduced [8]. In this new generation, data have more media types and higher volume than the typical big data. As a result, there is an increasing demand to develop new models and tools to analyze

Fig. 1 Illustration of 5-Vs from big data properties including variety, volume, velocity,

Data Loading...

Optimized distributed large-scale analytics over decentralized data sources with imperfect communication

Recommend Documents

Stabilization of Decentralized Systems Over Communication Channels

Distributed Optimization Over Weight-Balanced Digraphs with Event-Triggered Communication

WODII: a solution to process SPARQL queries over distributed data sources

Working with Data Sources

A Monitoring System for Distributed Edge Infrastructures with Decentralized Coordination

Decentralized Data Integration System

Interoperable Data Extraction and Analytics Queries over Blockchains

Data Analytics

Data Analytics

Data Analytics

Distributed Linked Data Business Communication Networks: The LUCID Endpoint

Relevant Query Answering over Streaming and Distributed Data A Study