
Pegasos
 Referenced in 103 articles
[sw08752]
 simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast ... contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ϵ2) iterations...

CG_DESCENT
 Referenced in 135 articles
[sw04813]
 Algorithm 851: CG_DESCENT. A conjugate gradient method with guaranteed descent Recently, a new nonlinear ... conjugate gradient scheme was developed which satisfies the descent condition gTkdk...

Wirtinger Flow
 Referenced in 110 articles
[sw34175]
 computational complexity, much like in a gradient descent scheme. The main contribution is that this...

HOGWILD
 Referenced in 65 articles
[sw28396]
 LockFree Approach to Parallelizing Stochastic Gradient Descent. Stochastic Gradient Descent (SGD) is a popular ... associated optimization problem is sparse, meaning most gradient updates only modify small parts...

ADADELTA
 Referenced in 59 articles
[sw39429]
 dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time ... minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning ... learning rate and appears robust to noisy gradient information, different model architecture choices, various data...

mboost
 Referenced in 67 articles
[sw07331]
 package mboost: ModelBased Boosting. Functional gradient descent algorithm (boosting) for optimizing general risk functions...

SGDQN
 Referenced in 28 articles
[sw19411]
 careful quasiNewton stochastic gradient descent. The SGDQN algorithm is a stochastic gradient descent ... fast as a firstorder stochastic gradient descent but requires less iterations to achieve...

BADMM
 Referenced in 36 articles
[sw20288]
 mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared...

SNLSDP
 Referenced in 38 articles
[sw05127]
 starting point for a gradient descent method with backtracking line search to solve the smooth...

LASSO
 Referenced in 33 articles
[sw02850]
 gradient descent algorithm for LASSO LASSO is a useful method to achieve the shrinkage...

PINNsNTK
 Referenced in 17 articles
[sw42058]
 networks behave during their training via gradient descent. More importantly, even less is known about ... infinite width limit during training via gradient descent. Specifically, we derive the NTK of PINNs ... fundamental pathology, we propose a novel gradient descent algorithm that utilizes the eigenvalues...

SGDR
 Referenced in 15 articles
[sw30752]
 SGDR: Stochastic Gradient Descent with Warm Restarts. Restart techniques are common in gradientfree optimization ... simple warm restart technique for stochastic gradient descent to improve its anytime performance when training...

EntropySGD
 Referenced in 20 articles
[sw41231]
 EntropySGD: Biasing Gradient Descent Into Wide Valleys. This paper proposes a new optimization algorithm ... inner loop to compute the gradient of the local entropy before each update...

Vowpal Wabbit
 Referenced in 12 articles
[sw28398]
 available with the baseline being sparse gradient descent (GD) on a loss function (several...

DARTS
 Referenced in 11 articles
[sw36213]
 efficient search of the architecture using gradient descent. Extensive experiments on CIFAR10, ImageNet, Penn...

TIGRA
 Referenced in 40 articles
[sw02333]
 presented. The TIGRA (Tikhonovgradient method) algorithm proposed uses steepest descent iterations in an inner...

CNTK
 Referenced in 9 articles
[sw21056]
 recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation...

Optspace
 Referenced in 8 articles
[sw12630]
 Optspace: A Gradient Descent Algorithm on the Grassmann Manifold for Matrix Completion. We consider...

neuraltangents
 Referenced in 5 articles
[sw39529]
 using exact Bayesian inference or using gradient descent via the Neural Tangent Kernel. Additionally, Neural ... Tangents provides tools to study gradient descent training dynamics of wide but finite networks...

DISCO
 Referenced in 7 articles
[sw13152]
 stages, the configurations are improved by gradient descent refinement. The algorithm is applied...