
Pegasos
 Referenced in 103 articles
[sw08752]
 analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem ... example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require...

HOGWILD
 Referenced in 65 articles
[sw28396]
 LockFree Approach to Parallelizing Stochastic Gradient Descent. Stochastic Gradient Descent (SGD) is a popular...

ADADELTA
 Referenced in 57 articles
[sw39429]
 dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time ... minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning...

SGDQN
 Referenced in 26 articles
[sw19411]
 careful quasiNewton stochastic gradient descent. The SGDQN algorithm is a stochastic gradient descent ... fast as a firstorder stochastic gradient descent but requires less iterations to achieve...

SGDR
 Referenced in 15 articles
[sw30752]
 SGDR: Stochastic Gradient Descent with Warm Restarts. Restart techniques are common in gradientfree optimization ... gradientbased optimization to improve the rate of convergence in accelerated gradient schemes to deal ... simple warm restart technique for stochastic gradient descent to improve its anytime performance when training...

CNTK
 Referenced in 9 articles
[sw21056]
 recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation...

LargeVis
 Referenced in 6 articles
[sw34905]
 effectively optimized through asynchronous stochastic gradient descent with a linear time complexity. The whole procedure...

BudgetedSVM
 Referenced in 4 articles
[sw10893]
 rank Linearization SVM, and Budgeted Stochastic Gradient Descent. BudgetedSVM trains models with accuracy comparable...

gradDescentR
 Referenced in 1 article
[sw38962]
 partially to reduce the computation load. Stochastic Gradient Descent (SGD), which is an optimization ... based algorithm to minimize stochastic step to average. Momentum Gradient Descent (MGD), which ... gradientdescentbased algorithm that mean and variance moment to do adaptive learning. Stochastic Variance ... converging by reducing the gradient. Semi Stochastic Gradient Descent (SSGD),which is a SGDbased...

ADMMSoftmax
 Referenced in 2 articles
[sw32744]
 Krylov, a quasi Newton, and a stochastic gradient descent method...

DeepTrack
 Referenced in 2 articles
[sw27576]
 accumulation. Second, we enhance the ordinary Stochastic Gradient Descent approach in CNN training with...

Jensen
 Referenced in 1 article
[sw26651]
 algorithms (including Gradient Descent, LBFGS, Stochastic Gradient Descent, Conjugate Gradient, etc.), and a family...

libFM
 Referenced in 1 article
[sw29652]
 implementation for factorization machines that features stochastic gradient descent (SGD) and alternating least squares...

deepNN
 Referenced in 1 article
[sw38663]
 perceptron, different activation functions, regularisation strategies, stochastic gradient descent and dropout. Thanks...

ProPPR
 Referenced in 1 article
[sw32915]
 learning can be performed using parallel stochastic gradient descent with a supervised personalized PageRank algorithm...

MLitB
 Referenced in 1 article
[sw30254]
 deep neural networks with synchronized, distributed stochastic gradient descent. MLitB offers several important opportunities...

SINE
 Referenced in 1 article
[sw32344]
 missing information on representation learning. A stochastic gradient descent based online algorithm is derived...

DSelectk
 Referenced in 1 article
[sw41672]
 using firstorder methods, such as stochastic gradient descent, and offers explicit control over...

FItSNE
 Referenced in 2 articles
[sw34901]
 Interpolationbased tSNE (FItSNE). tStochastic Neighborhood Embedding (tSNE) is a highly ... algorithm to approximate the gradient at each iteration of gradient descent. We accelerated this implementation...

MetaGrad
 Referenced in 1 article
[sw40373]
 also various types of stochastic and nonstochastic functions without any curvature. We prove this ... adapts automatically to the size of the gradients. Its main feature is that it simultaneously ... which they consistently outperform both online gradient descent and AdaGrad...