How does PLS differ from principal component regression?

Principal component regression chooses components that explain predictor variance alone, while partial least squares chooses components that also have high covariance with the responses, often giving better prediction with fewer components.

When is PLS especially useful?

When predictors are highly collinear or far more numerous than observations, as in spectroscopic and genomic data, where ordinary least squares cannot be applied reliably.

Partial Least Squares Regression

Partial least squares regression builds a small number of latent components from the predictors that have high covariance with the responses, enabling prediction when predictors are numerous and collinear.

Definition

Partial least squares regression is a method that extracts orthogonal latent components as linear combinations of the predictors chosen to maximize their covariance with the responses, and regresses the responses on these components.

Scope

This topic covers the construction of latent components by maximizing covariance between predictor and response blocks, the contrast with principal component regression and ordinary least squares, the handling of many correlated or high-dimensional predictors, selection of the number of components by cross-validation, and the method's prominent role in chemometrics.

Core questions

How can responses be predicted when there are many highly correlated predictors?
How does covariance-based component extraction differ from variance-based principal components?
How many latent components should be retained?
Why is the method central to chemometrics?

Key theories

Covariance-maximizing components: Unlike principal component regression, which extracts components of maximal predictor variance, partial least squares extracts components of maximal covariance with the responses, directing the reduction toward prediction.
Regression on latent structures: By regressing the responses on a few extracted latent components rather than on the original predictors, the method stabilizes estimation when predictors are collinear or outnumber the observations.

Clinical relevance

Partial least squares regression is the workhorse of chemometrics and is widely used in spectroscopy, genomics, and other settings with many correlated predictors and few samples, where ordinary least squares is unstable.

History

Partial least squares originated in Herman Wold's iterative estimation methods and was developed by Svante Wold and colleagues into a regression tool for chemometrics, where high-dimensional, collinear spectral data made it especially valuable.

Debates

Interpretation of latent components: The latent components are combinations of all predictors and can be difficult to interpret, and the relative merits of partial least squares versus penalized regression methods for high-dimensional prediction are debated.

Key figures

Herman Wold
Svante Wold

Seminal works

hastie2009
wold2001
johnson2007

Frequently asked questions

How does PLS differ from principal component regression?: Principal component regression chooses components that explain predictor variance alone, while partial least squares chooses components that also have high covariance with the responses, often giving better prediction with fewer components.
When is PLS especially useful?: When predictors are highly collinear or far more numerous than observations, as in spectroscopic and genomic data, where ordinary least squares cannot be applied reliably.