ADADELTA: An Adaptive Learning Rate Method. We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.

References in zbMATH (referenced in 57 articles )

Showing results 21 to 40 of 57.
Sorted by year (citations)
  1. De, Subhayan; Maute, Kurt; Doostan, Alireza: Bi-fidelity stochastic gradient descent for structural optimization under uncertainty (2020)
  2. Do, Dieu T. T.; Nguyen-Xuan, H.; Lee, Jaehong: Material optimization of tri-directional functionally graded plates by using deep neural network and isogeometric multimesh design approach (2020)
  3. Erway, Jennifer B.; Griffin, Joshua; Marcia, Roummel F.; Omheni, Riadh: Trust-region algorithms for training responses: machine learning methods using indefinite Hessian approximations (2020)
  4. Geng, Zhenglin; Johnson, Daniel; Fedkiw, Ronald: Coercing machine learning to output physically accurate results (2020)
  5. Göttlich, Simone; Knapp, Stephan: Artificial neural networks for the estimation of pedestrian interaction forces (2020)
  6. Henderson, Donna; Lunter, Gerton: Efficient inference in state-space models through adaptive learning in online Monte Carlo expectation maximization (2020)
  7. Karumuri, Sharmila; Tripathy, Rohit; Bilionis, Ilias; Panchal, Jitesh: Simulator-free solution of high-dimensional stochastic elliptic partial differential equations using deep neural networks (2020)
  8. Kylasa, Sudhir; Fang, Chih-Hao; Roosta, Fred; Grama, Ananth: Parallel optimization techniques for machine learning (2020)
  9. Lu, Lu; Shin, Yeonjong; Su, Yanhui; Karniadakis, George Em: Dying ReLU and initialization: theory and numerical examples (2020)
  10. Martens, James: New insights and perspectives on the natural gradient method (2020)
  11. Ruehle, Fabian: Data science applications to string theory (2020)
  12. Smith, Michael Stanley; Loaiza-Maya, Rubén; Nott, David J.: High-dimensional copula variational approximation through transformation (2020)
  13. Song, Hwanjun; Kim, Sundong; Kim, Minseok; Lee, Jae-Gil: Ada-boundary: accelerating DNN training via adaptive boundary batch selection (2020)
  14. Sun, Qi; Du, Qiang: A distributed optimal control problem with averaged stochastic gradient descent (2020)
  15. Sun, Ruo-Yu: Optimization for deep learning: an overview (2020)
  16. Tran, M.-N.; Nguyen, N.; Nott, D.; Kohn, R.: Bayesian deep net GLM and GLMM (2020)
  17. Ward, Rachel; Wu, Xiaoxia; Bottou, Leon: AdaGrad stepsizes: sharp convergence over nonconvex landscapes (2020)
  18. Yan, Liang; Zhou, Tao: An adaptive surrogate modeling based on deep neural networks for large-scale Bayesian inverse problems (2020)
  19. Geete, Kanu; Pandey, Manish: A noise-based stabilizer for convolutional neural networks (2019)
  20. Ismail Fawaz, Hassan; Forestier, Germain; Weber, Jonathan; Idoumghar, Lhassane; Muller, Pierre-Alain: Deep learning for time series classification: a review (2019)