ScholarGate
助手
Machine learning

动量SGD / Adam优化器

动量随机梯度下降(SGD)及其自适应后继者Adam是用于训练几乎所有现代深度学习模型的基础参数更新算法。动量SGD由Polyak (1964)正式提出,并由Rumelhart, Hinton和Williams (1986)引入神经网络训练。Adam由Kingma和Ba在ICLR 2015上提出,通过维护梯度平方的运行平均值来扩展了动量思想,产生了每参数自适应学习率,使其成为当代深度学习实践中的默认优化器。

在 MethodMind 中打开即将推出Apply, compare, get guidance
Tools & resources
下载幻灯片
Learn & explore
视频即将推出

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

登录

方法图谱

相关方法的邻域——选择一个节点以展开探索。

动量SGD / Adam优化器
批量归一化

来源

  1. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR 2015). arXiv:1412.6980. link
  2. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536. DOI: 10.1038/323533a0
  3. Polyak, B. T. (1964). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5), 1–17. DOI: 10.1016/0041-5553(64)90137-5
  4. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning (Ch. 8: Optimization for Training Deep Models). MIT Press. ISBN: 978-0-262-03561-3

如何引用本页

ScholarGate. (2026, June 3). Stochastic Gradient Descent with Momentum and Adaptive Moment Estimation (Adam). ScholarGate. https://scholargate.app/zh/deep-learning/stochastic-gradient-descent-with-momentum-adam-optimizer

选用哪种方法?

将本方法与其最相近的同类并置,并排研读——本馆将书籍铺陈于案上,取舍则由您定夺。

并排比较
ScholarGateSGD with Momentum / Adam Optimizer (Stochastic Gradient Descent with Momentum and Adaptive Moment Estimation (Adam)). 于 2026-06-17 检索自 https://scholargate.app/zh/deep-learning/stochastic-gradient-descent-with-momentum-adam-optimizer · 数据集: https://doi.org/10.5281/zenodo.20539026