What is the difference between ridge and lasso regression?

Both add a penalty on coefficient size to ordinary least squares. Ridge uses a squared (L2) penalty that shrinks all coefficients smoothly, while lasso uses an absolute-value (L1) penalty that can drive some coefficients exactly to zero, effectively selecting a subset of features.

Why is squared error so commonly used?

Minimizing squared error gives the conditional mean as the best predictor and corresponds to maximum likelihood when the noise is Gaussian. It is also mathematically convenient because it yields closed-form or smoothly differentiable solutions.

Regression and Function Approximation

Regression learns a continuous-valued function from labeled examples, predicting numeric targets and approximating an unknown input-output relationship.

PaperMind দিয়ে বিষয় খুঁজুনশীঘ্রইFind papers & topics

Tools & resources

স্লাইড ডাউনলোড করুন

Learn & explore

ভিডিওশীঘ্রই

Definition

Regression is the supervised task of estimating a function that maps inputs to a continuous output, typically by minimizing a loss such as squared error over training examples, with regularization penalties used to shrink coefficients and limit overfitting.

Scope

This topic covers supervised learning of real-valued outputs: linear and polynomial regression, basis-function and spline models, ridge and lasso regularization, the least-squares objective and its probabilistic interpretation as Gaussian noise, and the bias-variance trade-off that governs how flexible the fitted function should be.

Core questions

How is a continuous function fit to noisy labeled data?
What loss functions correspond to which noise assumptions?
How do ridge and lasso penalties trade fit against model complexity?
How flexible should a regression function be to balance bias and variance?

Key theories

Least squares and the Gauss-Markov view: Minimizing squared error yields the conditional mean as the optimal predictor under additive noise, and for linear models gives the best linear unbiased estimate, linking regression to maximum likelihood under Gaussian noise.
Regularized regression: Ridge regression shrinks coefficients toward zero with an L2 penalty while the lasso uses an L1 penalty that can set coefficients exactly to zero, performing variable selection and improving prediction in high dimensions.
Basis-function expansion: Nonlinear relationships are captured by mapping inputs through fixed or adaptive basis functions, such as polynomials, splines, or radial functions, so that a linear model in the new features fits a nonlinear function of the originals.

Clinical relevance

Regression is central to forecasting, scientific curve fitting, risk modeling, and any task with a numeric target, and the same regularization ideas that improve regression, such as ridge and lasso, recur throughout machine learning as a general means of controlling model complexity.

History

Least-squares regression dates to Gauss and Legendre and entered machine learning as a foundational predictive tool. Ridge regression introduced shrinkage to stabilize ill-conditioned problems, and the lasso, introduced by Tibshirani in 1996, made sparse regression a standard technique for high-dimensional prediction and variable selection.

Key figures

Trevor Hastie
Robert Tibshirani
Arthur Hoerl

Seminal works

hastie2009
bishop2006
tibshirani1996

Frequently asked questions

What is the difference between ridge and lasso regression?: Both add a penalty on coefficient size to ordinary least squares. Ridge uses a squared (L2) penalty that shrinks all coefficients smoothly, while lasso uses an absolute-value (L1) penalty that can drive some coefficients exactly to zero, effectively selecting a subset of features.
Why is squared error so commonly used?: Minimizing squared error gives the conditional mean as the best predictor and corresponds to maximum likelihood when the noise is Gaussian. It is also mathematically convenient because it yields closed-form or smoothly differentiable solutions.