Fully asynchronous stochastic coordinate descent: a tight lower bound on the parallelism achieving linear speedup

  • PDF / 848,660 Bytes
  • 63 Pages / 439.37 x 666.142 pts Page_size
  • 58 Downloads / 192 Views

DOWNLOAD

REPORT


Series A

Fully asynchronous stochastic coordinate descent: a tight lower bound on the parallelism achieving linear speedup Yun Kuen Cheung1 · Richard Cole2 · Yixin Tao2 Received: 7 May 2019 / Accepted: 4 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2020

Abstract We seek tight bounds on the viable parallelism in asynchronous implementations of coordinate descent that achieves linear speedup. We focus on asynchronous coordinate descent (ACD) algorithms on convex functions which consist of the sum of a smooth convex part and a possibly non-smooth separable convex part. We quantify the shortfall in progress compared to the standard sequential stochastic gradient descent. This leads to a simple yet tight analysis of the standard stochastic ACD in a partially asynchronous environment, generalizing and improving the bounds in prior work. We also give a considerably more involved analysis for general asynchronous environments in which the only constraint is that each update can overlap with at most q others. The new lower bound on the maximum degree of parallelism attaining linear speedup is tight and improves the best prior bound almost quadratically. Keywords Asynchronous coordinate descent · Stochastic coordinate descent · Linear speedup · Amortized analysis Mathematics Subject Classification 90C06 Large-scale problems in mathematical programming · 90C25 Convex programming

Part of the work done while Yun Kuen Cheung held positions at Courant Institute, NYU, at Faculty of Computer Science, University of Vienna and at Max Planck Institute for Informatics, Saarland Informatics Campus. He was supported in part by NSF Grant CCF-1217989, the Vienna Science and Technology Fund (WWTF) project ICT10-002, Singapore NRF 2018 Fellowship NRF-NRFF2018-07 and MOE AcRF Tier 2 Grant 2016-T2-1-170. Additionally the research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC Grant Agreement No. 340506. Richard Cole and Yixin Tao’s work was supported in part by NSF Grants CCF-1217989, CCF-1527568 and CCF-1909538. Electronic supplementary material The online version of this article (https://doi.org/10.1007/s10107020-01552-8) contains supplementary material, which is available to authorized users.

B

Yixin Tao [email protected]

Extended author information available on the last page of the article

123

Y. K. Cheung et al.

1 Introduction We consider the problem of finding an (approximate) minimum point of a convex function F : Rn → R of the form

F(x) = f (x) +

n 

Ψk (xk ),

k=1

where f : Rn → R is a smooth convex function,1 and each Ψk : R → R is a univariate convex function, but may be non-smooth. Such functions occur in many data analysis and machine learning problems, such as linear regression (e.g., the Lasso approach to regularized least squares [28]) where Ψk (xk ) = |xk |, logistic regression [21], ridge regression [26] where Ψk (xk ) is a quadratic function, and