Adam: A Method for Stochastic Optimization. We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

References in zbMATH (referenced in 35 articles )

Showing results 1 to 20 of 35.
Sorted by year (citations)

1 2 next

  1. Alaa, Ahmed M.; van der Schaar, Mihaela: A hidden absorbing semi-Markov model for informatively censored temporal data: learning and inference (2018)
  2. Albert Zeyer, Tamer Alkhouli, Hermann Ney: RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition (2018) arXiv
  3. Baydin, Atılım Güneş; Pearlmutter, Barak A.; Radul, Alexey Andreyevich; Siskind, Jeffrey Mark: Automatic differentiation in machine learning: a survey (2018)
  4. Bottou, Léon; Curtis, Frank E.; Nocedal, Jorge: Optimization methods for large-scale machine learning (2018)
  5. Canyu Le; Xin Li: JigsawNet: Shredded Image Reassembly using Convolutional Neural Network and Loop-based Composition (2018) arXiv
  6. Chan, Shing; Elsheikh, Ahmed H.: A machine learning approach for efficient uncertainty quantification using multiscale methods (2018)
  7. Dai, Bin; Wang, Yu; Aston, John; Hua, Gang; Wipf, David: Connections with robust PCA and the role of emergent sparsity in variational autoencoder models (2018)
  8. de Bruin, Tim; Kober, Jens; Tuyls, Karl; Babuška, Robert: Experience selection in deep reinforcement learning for control (2018)
  9. Diveev, A. I.; Konstantinov, S. V.: Study of the practical convergence of evolutionary algorithms for the optimal program control of a wheeled robot (2018)
  10. E, Weinan; Yu, Bing: The Deep Ritz Method: a deep learning-based numerical algorithm for solving variational problems (2018)
  11. Hubara, Itay; Courbariaux, Matthieu; Soudry, Daniel; El-Yaniv, Ran; Bengio, Yoshua: Quantized neural networks: training neural networks with low precision weights and activations (2018)
  12. Lee, Seunghye; Ha, Jingwan; Zokhirova, Mehriniso; Moon, Hyeonjoon; Lee, Jaehong: Background information of deep learning for structural engineering (2018)
  13. Li, Qianxiao; Chen, Long; Tai, Cheng; E, Weinan: Maximum principle based algorithms for deep learning (2018)
  14. Sun, Yuhang; Liu, Qingjie: Attribute recognition from clothing using a faster R-CNN based multitask network (2018)
  15. Ueltzhöffer, Kai: Deep active inference (2018)
  16. Ye, Jong Chul; Han, Yoseob; Cha, Eunju: Deep convolutional framelets: a general deep learning framework for inverse problems (2018)
  17. Yoo, JaeJun; Wahab, Abdul; Ye, Jong Chul: A mathematical framework for deep learning in elastic source imaging (2018)
  18. Zhang, Junbo; Zheng, Yu; Qi, Dekang; Li, Ruiyuan; Yi, Xiuwen; Li, Tianrui: Predicting citywide crowd flows using deep spatio-temporal residual networks (2018)
  19. Bui, Thang D.; Yan, Josiah; Turner, Richard E.: A unifying framework for Gaussian process pseudo-point approximations using power expectation propagation (2017)
  20. Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco; Schwartzman, Ariel: Weakly supervised classification in high energy physics (2017)

1 2 next