Partial Least Squares Regression
Partial least squares regression builds a small number of latent components from the predictors that have high covariance with the responses, enabling prediction when predictors are numerous and collinear.
Definition
Partial least squares regression is a method that extracts orthogonal latent components as linear combinations of the predictors chosen to maximize their covariance with the responses, and regresses the responses on these components.
Scope
This topic covers the construction of latent components by maximizing covariance between predictor and response blocks, the contrast with principal component regression and ordinary least squares, the handling of many correlated or high-dimensional predictors, selection of the number of components by cross-validation, and the method's prominent role in chemometrics.
Core questions
- How can responses be predicted when there are many highly correlated predictors?
- How does covariance-based component extraction differ from variance-based principal components?
- How many latent components should be retained?
- Why is the method central to chemometrics?
Key theories
- Covariance-maximizing components
- Unlike principal component regression, which extracts components of maximal predictor variance, partial least squares extracts components of maximal covariance with the responses, directing the reduction toward prediction.
- Regression on latent structures
- By regressing the responses on a few extracted latent components rather than on the original predictors, the method stabilizes estimation when predictors are collinear or outnumber the observations.
Clinical relevance
Partial least squares regression is the workhorse of chemometrics and is widely used in spectroscopy, genomics, and other settings with many correlated predictors and few samples, where ordinary least squares is unstable.
History
Partial least squares originated in Herman Wold's iterative estimation methods and was developed by Svante Wold and colleagues into a regression tool for chemometrics, where high-dimensional, collinear spectral data made it especially valuable.
Debates
- Interpretation of latent components
- The latent components are combinations of all predictors and can be difficult to interpret, and the relative merits of partial least squares versus penalized regression methods for high-dimensional prediction are debated.
Key figures
- Herman Wold
- Svante Wold
Related topics
Seminal works
- hastie2009
- wold2001
- johnson2007
Frequently asked questions
- How does PLS differ from principal component regression?
- Principal component regression chooses components that explain predictor variance alone, while partial least squares chooses components that also have high covariance with the responses, often giving better prediction with fewer components.
- When is PLS especially useful?
- When predictors are highly collinear or far more numerous than observations, as in spectroscopic and genomic data, where ordinary least squares cannot be applied reliably.