Machine learning
Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent (SGD) is a first-order iterative optimization algorithm, rooted in the stochastic approximation framework introduced by Robbins and Monro in 1951, that minimizes an objective function by updating model parameters using the gradient computed on a single randomly selected training example (or a small mini-batch) at each step. It is the core optimization engine behind modern machine learning and deep learning, enabling the training of models on datasets too large to fit in memory.
Open in MethodMindSoonVideoSoon
Read the full method
Members only
Sign inSign in with a free account to read this section.
Sources
- Robbins, H. & Monro, S. (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3), 400–407. DOI: 10.1214/aoms/1177729586 ↗
- Goodfellow, I., Bengio, Y. & Courville, A. (2016). Deep Learning (Ch. 8). MIT Press. ISBN: 978-0-262-03561-3