Process / pipeline
Stochastic Optimization — SGD and Variants
Stochastic optimization is a family of iterative methods that minimize an objective function by computing gradients on randomly sampled subsets of data — mini-batches — rather than on the entire dataset at once. Pioneered by Robbins and Monro in 1951 as stochastic approximation, the approach became the standard engine for training large-scale machine-learning models through variants such as SGD with momentum, AdaGrad, RMSProp, and Adam.
Open in MethodMindSoonVideoSoon
Read the full method
Members only
Sign inSign in with a free account to read this section.
Sources
- Robbins, H. & Monro, S. (1951). A Stochastic Approximation Method. Annals of Mathematical Statistics, 22(3), 400-407. DOI: 10.1214/aoms/1177729586 ↗
- Kingma, D.P. & Ba, J. (2015). Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR 2015). link ↗