Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks

  • PDF / 575,886 Bytes
  • 22 Pages / 439.37 x 666.142 pts Page_size
  • 59 Downloads / 172 Views

DOWNLOAD

REPORT


Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks Zengfeng Huang1

· Ke Yi2 · Qin Zhang3

Received: 27 October 2016 / Accepted: 23 November 2018 / Published online: 11 December 2018 © Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the count-tracking problem, where over time, and there are k players, each holding a counter n i that gets incremented  the goal is to track an ε-approximation of their sum n = i n i continuously at all times, using minimum communication. While the deterministic communication complexity of the problem is Θ(k/ε · log N ), where N is the final value of n when the tracking finishes, √ we show that with randomization, the communication cost can be reduced to Θ( k/ε ·log N ). Our algorithm is simple and uses only O(1) space at each player, while the lower bound holds even assuming each player has infinite computing power. Then, we extend our techniques to two related distributed tracking problems: frequency-tracking and rank-tracking, and obtain similar improvements over previous deterministic algorithms. Both problems are of central importance in large data monitoring and analysis, and have been extensively studied in the literature. Keywords Continuous distributed tracking · Randomized algorithms · Distributed streaming

A preliminary version of this article was presented at the ACM Symposium on Principles of Database Systems (PODS), 2012.

B

Zengfeng Huang [email protected] Ke Yi [email protected] Qin Zhang [email protected]

1

School of Data Science, Fudan University, Shanghai, China

2

The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong

3

Indiana University Bloomington, Bloomington, USA

123

Algorithmica (2019) 81:2222–2243

2223

1 Introduction We start with a very basic problem in distributed tracking, what we call count-tracking. There are k players each holding a counter n i that is initially 0. Over time, the counters get incremented and we denote by n i (t) the value of the counter  n i at time t. The ˆ goal is to track an ε-approximation of the total count n(t) = i n i (t), i.e., an n(t) such that (1 − ε)n(t) ≤ n(t) ˆ ≤ (1 + ε)n(t),1 continuously at all times. There is a coordinator whose job is to maintain such an n(t), ˆ and will try to do so using minimum communication with the k players (the formal model of computation will be defined shortly). There is a trivial solution to the count-tracking problem: Every time a counter n i has increased by a 1 + ε factor, the player informs the coordinator of the change. Thus, the coordinator always has an ε-approximation of every n i , hence an ε-approximation of their sum n. Letting N denote the final value of n, simple analysis shows that the communication cost of this algorithm is O(k/ε · log N ).2 This algorithm was actually used in [16] for solving essentially the same problem, which also provided many practica