AdaGrad
ADAGRAD: adaptive gradient algorithm; Adaptive subgradient methods for online learning and stochastic optimization. We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.
Keywords for this software
References in zbMATH (referenced in 165 articles , 1 standard article )
Showing results 1 to 20 of 165.
Sorted by year (- Chada, Neil K.; Tong, Xin T.: Convergence acceleration of ensemble Kalman inversion in nonlinear settings (2022)
- Clausen, Johan Bjerre Bach; Li, Hongyan: Big data driven order-up-to level model: application of machine learning (2022)
- Cui, Tao; Wang, Ziming; Xiang, Xueshuang: An efficient neural network method with plane wave activation functions for solving Helmholtz equation (2022)
- Goda, Takashi; Hironaka, Tomohiko; Kitade, Wataru; Foster, Adam: Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs (2022)
- Huang, Shan; Zhu, Renchuan; Chang, Hongyu; Wang, Hui; Yu, Yun: Machine learning to approximate free-surface Green’s function and its application in wave-body interactions (2022)
- Jia, Yichen; Jeong, Jong-Hyeon: Deep learning for quantile regression under right censoring: deepquantreg (2022)
- Kim, Sehwan; Song, Qifan; Liang, Faming: Stochastic gradient Langevin dynamics with adaptive drifts (2022)
- Koh, Pang Wei; Steinhardt, Jacob; Liang, Percy: Stronger data poisoning attacks break data sanitization defenses (2022)
- Moll, Salvador; Pallardó, Vicent: An augmented Lagrangian model for signal segmentation (2022)
- Ollier, Edouard: Fast selection of nonlinear mixed effect models using penalized likelihood (2022)
- Pfannschmidt, Karlson; Gupta, Pritha; Haddenhorst, Björn; Hüllermeier, Eyke: Learning context-dependent choice functions (2022)
- Sharrock, Louis; Kantas, Nikolas: Joint online parameter estimation and optimal sensor placement for the partially observed stochastic advection-diffusion equation (2022)
- Tran-Dinh, Quoc; Pham, Nhan H.; Phan, Dzung T.; Nguyen, Lam M.: A hybrid stochastic optimization framework for composite nonconvex optimization (2022)
- Zheng, Yuchen; Xie, Yujia; Lee, Ilbin; Dehghanian, Amin; Serban, Nicoleta: Parallel subgradient algorithm with block dual decomposition for large-scale optimization (2022)
- Ashbrock, Jonathan; Powell, Alexander M.: Stochastic Markov gradient descent and training low-bit neural networks (2021)
- Barakat, Anas; Bianchi, Pascal: Convergence and dynamical behavior of the ADAM algorithm for nonconvex stochastic optimization (2021)
- Barakat, Anas; Bianchi, Pascal; Hachem, Walid; Schechtman, Sholom: Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance (2021)
- Dehghani, Hamidreza; Zilian, Andreas: A hybrid MGA-MSGD ANN training approach for approximate solution of linear elliptic PDEs (2021)
- De Loera, Jesús A.; Haddock, Jamie; Ma, Anna; Needell, Deanna: Data-driven algorithm selection and tuning in optimization and signal processing (2021)
- Ding, Man; Han, Congying; Guo, Tiande: High generalization performance structured self-attention model for knapsack problem (2021)