ADADELTA

ADADELTA: An Adaptive Learning Rate Method. We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.


References in zbMATH (referenced in 57 articles )

Showing results 41 to 57 of 57.
Sorted by year (citations)
  1. Loaiza-Maya, Rubén; Smith, Michael Stanley: Variational Bayes estimation of discrete-margined copula models with application to time series (2019)
  2. Pulido, Manuel; van Leeuwen, Peter Jan: Sequential Monte Carlo with kernel embedded mappings: the mapping particle filter (2019)
  3. Wang, Peiyi; Liu, Hongtao; Wu, Fangzhao; Song, Jinduo; Xu, Hongyan; Wang, Wenjun: REKA: relation extraction with knowledge-aware attention (2019)
  4. Bottou, Léon; Curtis, Frank E.; Nocedal, Jorge: Optimization methods for large-scale machine learning (2018)
  5. Ghadai, Sambit; Balu, Aditya; Sarkar, Soumik; Krishnamurthy, Adarsh: Learning localized features in 3D CAD models for manufacturability analysis of drilled holes (2018)
  6. Kojima, Ryosuke; Sato, Taisuke: Learning to rank in PRISM (2018)
  7. Lee, Seunghye; Ha, Jingwan; Zokhirova, Mehriniso; Moon, Hyeonjoon; Lee, Jaehong: Background information of deep learning for structural engineering (2018)
  8. Nguyen, Thi Nhat Anh; Bouzerdoum, Abdesselam; Phung, Son Lam: Stochastic variational hierarchical mixture of sparse Gaussian processes for regression (2018)
  9. Ong, Victor M.-H.; Nott, David J.; Smith, Michael S.: Gaussian variational approximation with a factor covariance structure (2018)
  10. Ong, Victor M. H.; Nott, David J.; Tran, Minh-Ngoc; Sisson, Scott A.; Drovandi, Christopher C.: Variational Bayes with synthetic likelihood (2018)
  11. Takase, Tomoumi; Oyama, Satoshi; Kurihara, Masahito: Why does large batch training result in poor generalization? A comprehensive explanation and a better strategy from the viewpoint of stochastic optimization (2018)
  12. Tan, Linda S. L.; Nott, David J.: Gaussian variational approximation with sparse precision matrices (2018)
  13. Tripathy, Rohit K.; Bilionis, Ilias: Deep UQ: learning deep neural network surrogate models for high dimensional uncertainty quantification (2018)
  14. Krauss, Christopher; Do, Xuan Anh; Huck, Nicolas: Deep neural networks, gradient-boosted trees, random forests: statistical arbitrage on the S&P 500 (2017)
  15. Qin, Pengda; Xu, Weiran; Guo, Jun: Providing definitive learning direction for relation classification system (2017)
  16. Rawat, Waseem; Wang, Zenghui: Deep convolutional neural networks for image classification: a comprehensive review (2017)
  17. Schmidhuber, Jürgen: Deep learning in neural networks: an overview (2015) ioport