DiSCO: Distributed Optimization for Self-Concordant Empirical Loss. We propose a new distributed algorithm for empirical risk minimization in machine learning. The algorithm is based on an inexact damped Newton method, where the inexact Newton steps are computed by a distributed preconditioned conjugate gradient method. We analyze its iteration complexity and communication efficiency for minimizing self-concordant empirical loss functions, and discuss the results for distributed ridge regression, logistic regression and binary classification with a smoothed hinge loss. In a standard setting for supervised learning, where the n data points are i.i.d. sampled and when the regularization parameter scales as 1/sqrtn, we show that the proposed algorithm is communication efficient: the required round of communication does not increase with the sample size n, and only grows slowly with the number of machines.
Keywords for this software
References in zbMATH (referenced in 12 articles )
Showing results 1 to 12 of 12.
- Zhang, Jiaqi; You, Keyou; Başar, Tamer: Distributed adaptive Newton methods with global superlinear convergence (2022)
- Ostrovskii, Dmitrii M.; Bach, Francis: Finite-sample analysis of (M)-estimators using self-concordance (2021)
- Lee, Ching-pei; Chang, Kai-Wei: Distributed block-diagonal approximation methods for regularized empirical risk minimization (2020)
- Richards, Dominic; Rebeschini, Patrick: Graph-dependent implicit regularisation for distributed stochastic subgradient descent (2020)
- Sun, Tianxiao; Necoara, Ion; Tran-Dinh, Quoc: Composite convex optimization with global and local inexact oracles (2020)
- Yuan, Xiao-Tong; Li, Ping: On convergence of distributed approximate Newton methods: globalization, sharper bounds and beyond (2020)
- Jordan, Michael I.; Lee, Jason D.; Yang, Yun: Communication-efficient distributed statistical inference (2019)
- Sun, Tianxiao; Quoc, Tran-Dinh: Generalized self-concordant functions: a recipe for Newton-type methods (2019)
- Xiao, Lin; Yu, Adams Wei; Lin, Qihang; Chen, Weizhu: DSCOVR: randomized primal-dual block coordinate algorithms for asynchronous distributed optimization (2019)
- Jain, Prateek; Kakade, Sham M.; Kidambi, Rahul; Netrapalli, Praneeth; Sidford, Aaron: Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification (2018)
- Lee, Jason D.; Lin, Qihang; Ma, Tengyu; Yang, Tianbao: Distributed stochastic variance reduced gradient methods by sampling extra data with replacement (2017)
- Zheng, Shun; Wang, Jialei; Xia, Fen; Xu, Wei; Zhang, Tong: A general distributed dual coordinate optimization framework for regularized loss minimization (2017)