SGDR

SGDR: Stochastic Gradient Descent with Warm Restarts. Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions. In this paper, we propose a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks. We empirically study its performance on the CIFAR-10 and CIFAR-100 datasets, where we demonstrate new state-of-the-art results at 3.14% and 16.21%, respectively. We also demonstrate its advantages on a dataset of EEG recordings and on a downsampled version of the ImageNet dataset. Our source code is available at https://github.com/loshchil/SGDR


References in zbMATH (referenced in 12 articles )

Showing results 1 to 12 of 12.
Sorted by year (citations)

  1. Bakhtin, Anton; Deng, Yuntian; Gross, Sam; Ott, Myle; Ranzato, Marc’aurelio; Szlam, Arthur: Residual energy-based models for text (2021)
  2. Qingzhong Wang, Pengfei Zhang, Haoyi Xiong, Jian Zhao: Face.evoLVe: A High-Performance Face Recognition Library (2021) arXiv
  3. Theresa Eimer, André Biedenkapp, Maximilian Reimer, Steven Adriaensen, Frank Hutter, Marius Lindauer: DACBench: A Benchmark Library for Dynamic Algorithm Configuration (2021) arXiv
  4. Yeo, Kyongmin; Grullon, Dylan E. C.; Sun, Fan-Keng; Boning, Duane S.; Kalagnanam, Jayant R.: Variational inference formulation for a model-free simulation of a dynamical system with unknown parameters by a recurrent neural network (2021)
  5. Banert, Sebastian; Ringh, Axel; Adler, Jonas; Karlsson, Johan; Öktem, Ozan: Data-driven nonsmooth optimization (2020)
  6. Chen, Yiming; Pan, Tianci; He, Cheng; Cheng, Ran: Efficient evolutionary deep neural architecture search (NAS) by noisy network morphism mutation (2020)
  7. Kang, Dongseok; Ahn, Chang Wook: Efficient neural network space with genetic search (2020)
  8. Mohamed, Shakir; Rosca, Mihaela; Figurnov, Michael; Mnih, Andriy: Monte Carlo gradient estimation in machine learning (2020)
  9. Sun, Ruo-Yu: Optimization for deep learning: an overview (2020)
  10. Tan, Hao; He, Cheng; Tang, Dexuan; Cheng, Ran: Efficient evolutionary neural architecture search (NAS) by modular inheritable crossover (2020)
  11. Kaiyang Zhou, Tao Xiang: Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch (2019) arXiv
  12. Dan Hendrycks, Kevin Gimpel: Gaussian Error Linear Units (GELUs) (2016) arXiv