Canonical Correlation Analysis
Canonical correlation analysis finds pairs of linear combinations, one from each of two sets of variables, that are maximally correlated with each other.
Definition
Canonical correlation analysis is a method that, given two sets of variables, constructs successive pairs of uncorrelated linear combinations so that each pair has the largest possible correlation subject to being uncorrelated with earlier pairs.
Scope
This topic covers the definition of canonical variates and canonical correlations as the solution of a generalized eigenvalue problem involving the cross-covariance of two variable sets, their successive orthogonality, interpretation of canonical loadings, and the embedding of multiple regression and discriminant analysis as special cases of the canonical framework.
Core questions
- What linear combinations of two variable sets are most strongly associated?
- How many independent dimensions of association exist between the two sets?
- How are canonical variates and their loadings interpreted?
- How do regression and discriminant analysis arise as special cases?
Key theories
- Canonical variates as a generalized eigenproblem
- The canonical correlations are the square roots of the eigenvalues of a matrix built from the within- and between-set covariance matrices, and the canonical variates are the associated linear combinations of each variable set.
- Unifying framework for multivariate association
- Multiple correlation, discriminant analysis, and correspondence analysis can each be cast as instances of canonical correlation between suitably chosen variable sets, giving the method a unifying role in multivariate analysis.
Clinical relevance
Canonical correlation analysis is used to relate two blocks of measurements, such as a set of predictors and a set of outcomes, or two modalities of data, identifying the strongest shared dimensions of variation between them.
History
Canonical correlation analysis was introduced by Hotelling in 1936 as a general method for relating two sets of variates. It was later developed within the formal theory of multivariate analysis and recognized as a unifying framework subsuming several other multivariate techniques.
Key figures
- Harold Hotelling
- T. W. Anderson
Related topics
Seminal works
- anderson2003
- mardia1979
- johnson2007
Frequently asked questions
- How does canonical correlation differ from multiple regression?
- Multiple regression relates several predictors to a single response, whereas canonical correlation relates two sets of variables symmetrically, finding combinations on both sides that are maximally correlated.
- What does the first canonical correlation represent?
- It is the largest possible correlation between any linear combination of the first variable set and any linear combination of the second, measuring the strongest dimension of association between the two sets.