A Non-Asymptotic Analysis of Network Independence for Distributed Stochastic Gradient Descent

6 Jun 2019  ·  Shi Pu, Alex Olshevsky, Ioannis Ch. Paschalidis ·

This paper is concerned with minimizing the average of $n$ cost functions over a network, in which agents may communicate and exchange information with their peers in the network. Specifically, we consider the setting where only noisy gradient information is available. To solve the problem, we study the standard distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, we not only show that DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD), but also explicitly identify the non-asymptotic convergence rate as a function of characteristics of the objective functions and the network. Furthermore, we derive the time needed for DSGD to approach the asymptotic convergence rate, which behaves as $K_T=\mathcal{O}(\frac{n}{(1-\rho_w)^2})$, where $(1-\rho_w)$ denotes the spectral gap of the mixing matrix of communicating agents. Finally, we construct a "hard" optimization problem for which we show the transient time needed for DSGD to approach the asymptotic convergence rate is lower bounded by $\Omega(\frac{n}{(1-\rho_w)^2})$, implying the sharpness of the obtained result.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here