Multivariate Normal Distribution
The multivariate normal distribution generalizes the bell curve to random vectors, characterized completely by a mean vector and a covariance matrix.
Definition
The multivariate normal distribution is the joint distribution of a random vector whose every linear combination of components is univariate normal, fully determined by its mean vector and covariance matrix.
Scope
This topic covers the density and characteristic function of the multivariate normal, its closure under linear transformation, marginalization, and conditioning, the relationship between zero covariance and independence for normal variables, the geometry of its elliptical contours and the Mahalanobis distance, and its role as the assumed model in classical multivariate inference.
Core questions
- What characterizes the multivariate normal distribution?
- How do marginal and conditional distributions of a normal vector behave?
- Why does it appear so often as a modeling assumption?
- How is its elliptical geometry related to the Mahalanobis distance?
Key theories
- Closure properties
- Linear transformations, marginals, and conditionals of a multivariate normal vector are themselves normal, and conditional means are linear with constant conditional covariance, properties that make the distribution exceptionally tractable.
- Elliptical geometry and Mahalanobis distance
- The contours of constant density are ellipsoids whose squared radius is the Mahalanobis distance from the mean, which follows a chi-squared distribution and underlies many multivariate test statistics.
Clinical relevance
The multivariate normal model justifies the sampling distributions used in multivariate testing and estimation, and serves as the component distribution in Gaussian discriminant analysis and Gaussian mixture clustering.
History
The multivariate normal distribution was developed alongside correlation and regression theory in the early twentieth century and became the foundation of the classical theory of multivariate analysis formalized in mid-century texts.
Debates
- Validity of the normality assumption
- Many classical procedures assume multivariate normality, but real multivariate data often show heavy tails or skewness, prompting robust and elliptical-distribution alternatives and tests of multivariate normality.
Key figures
- T. W. Anderson
- Robb Muirhead
Related topics
Seminal works
- anderson2003
- mardia1979
- muirhead1982
Frequently asked questions
- Does zero correlation imply independence for normal variables?
- For components of a single multivariate normal vector, uncorrelated components are indeed independent; this equivalence is special to the normal and does not hold for distributions in general.
- What is the Mahalanobis distance?
- It is a scale- and correlation-adjusted distance from a point to the mean; for multivariate normal data its square follows a chi-squared distribution and is used to detect outliers and in classification.