What is the difference between covariance and correlation?

Covariance measures how two variables vary together but its size depends on their units, so it is hard to interpret directly. Correlation standardises covariance by the two standard deviations, producing a unitless coefficient between -1 and +1 that is comparable across variables.

When should Spearman rather than Pearson correlation be used?

Spearman correlation, which works on ranks, is preferred when the relationship is monotonic but not linear, when the data are ordinal, or when outliers or non-normal distributions would distort the Pearson coefficient.

Correlation and Covariance

Covariance measures how two variables vary together, and correlation rescales that joint variation into a coefficient between -1 and +1 that captures the strength and direction of their linear association without depending on the units of measurement. Correlation is one of the first tools used to describe the relationship between two continuous quantities in health research.

Cari Topik dengan PaperMindTidak lama lagiFind papers & topics

Tools & resources

Muat turun slaid

Learn & explore

VideoTidak lama lagi

Definition

Covariance is the average product of the deviations of two variables from their means; correlation is the covariance divided by the product of the two standard deviations, yielding a unitless coefficient between -1 and +1 that quantifies the strength and direction of their linear association.

Scope

This entry covers covariance and its standardised form, the Pearson product-moment correlation coefficient, the rank-based Spearman correlation for monotonic association, and the common cautions: correlation describes association rather than causation, reflects linear (or monotonic) relationships only, and is distinct from agreement. It is a methodological topic, not clinical guidance.

Core questions

How is the joint variation of two variables summarised in a single number?
What does a correlation coefficient of a given size mean, and what does its sign indicate?
When should a rank-based (Spearman) rather than a Pearson coefficient be used?
Why does correlation not imply causation, and why is it not the same as agreement?

Key concepts

Covariance
Pearson product-moment correlation coefficient
Spearman rank correlation
Standardisation and unit-free measurement
Linear versus monotonic association
Correlation is not causation
Correlation versus agreement

Mechanisms

Covariance accumulates the products of paired deviations from each variable's mean; it is positive when high values of one variable tend to accompany high values of the other and negative when they move in opposite directions, but its magnitude depends on the units. Dividing by the two standard deviations removes the units and bounds the result between -1 and +1, producing the Pearson correlation coefficient, which captures strictly linear association. When the relationship is monotonic but not linear, or the data are ordinal or non-normal, the Spearman coefficient — Pearson's coefficient applied to the ranks — is used instead. A correlation near zero indicates the absence of linear association but does not rule out a nonlinear relationship.

Clinical relevance

Correlation coefficients are routinely reported when researchers describe how two clinical measurements move together. A key caution in appraisal is that a high correlation between two measurement methods does not mean they agree, since two instruments can be strongly correlated yet systematically differ; agreement is assessed by other approaches such as limits-of-agreement analysis. This entry describes the method and is not a basis for individual clinical decisions.

Evidence & guidelines

Standard medical-statistics texts and the Statistics Notes series in the BMJ set out how correlation should be reported and interpreted, including the distinction between correlation and agreement that motivated the Bland-Altman limits-of-agreement approach for method-comparison studies.

History

The correlation coefficient grew out of Francis Galton's work on heredity and was formalised by Karl Pearson at the end of the nineteenth century. Charles Spearman introduced the rank-based coefficient in 1904 for situations where only the ordering of values is reliable. In the late twentieth century, Bland and Altman drew a sharp and influential distinction between correlation and agreement, reshaping how method-comparison studies are analysed.

Debates

Does a high correlation demonstrate that two measurement methods agree?: No: two methods can be highly correlated while differing systematically, so correlation is an inappropriate measure of agreement. Bland and Altman argued for limits-of-agreement analysis instead, a position now standard in method-comparison studies.

Key figures

Francis Galton
Karl Pearson
Charles Spearman
Douglas Altman
Martin Bland

Seminal works

spearman-1904
bland-altman-1986

Frequently asked questions

What is the difference between covariance and correlation?: Covariance measures how two variables vary together but its size depends on their units, so it is hard to interpret directly. Correlation standardises covariance by the two standard deviations, producing a unitless coefficient between -1 and +1 that is comparable across variables.
When should Spearman rather than Pearson correlation be used?: Spearman correlation, which works on ranks, is preferred when the relationship is monotonic but not linear, when the data are ordinal, or when outliers or non-normal distributions would distort the Pearson coefficient.