Expected Value and Variance

A distribution's centre and spread

The expected value E[X] is the long-run average of a random variable, computed as a probability-weighted sum of its possible values. The variance Var[X] measures spread around that mean as the expected squared deviation. Its square root, the standard deviation, shares the same units as the original variable. Together, these two statistics form the first two moments that summarise a distribution's location and dispersion. Correctly computing and reporting them strengthens the credibility and interpretability of research findings.

Concept and Formula

The expected value represents the long-run average you would observe if an experiment were repeated infinitely. For a discrete random variable it is defined as the sum of each value x_i multiplied by P(X = x_i); for a continuous variable the sum is replaced by an integral. Expected value is linear: E[aX + b] = aE[X] + b holds for any constants a and b, which simplifies many calculations considerably. The variance is then Var[X] = E[(X minus E[X]) squared], giving the typical squared deviation of the distribution around its mean.

Computing and Reading the Values

In practice, expected value and variance are usually estimated from sample data. The sample mean x-bar is an unbiased estimator of E[X], while the sample variance s-squared uses n minus 1 in the denominator as a Bessel correction. The standard deviation s is the square root of the variance and shares the same units as the original variable, making it directly interpretable. For instance, if s equals 5 cm for a set of measurements, the typical deviation from the mean is roughly 5 cm. Because variance is in squared units, the standard deviation is almost always preferred when reporting results.

Common Misconceptions

The most common error is treating the expected value as a result that must occur on a single trial. In reality E[X] is a long-run average and may never be observed; the expected value of a fair die is 3.5, which is not even a possible outcome. A second misconception is viewing low variance as always desirable and high variance as bad; in some measurements naturally large variability is expected and must be reflected in the model. A third confusion is assuming that if variances add for independent variables, standard deviations also add: Var[X + Y] = Var[X] + Var[Y] holds, but SD[X + Y] is the square root of that sum, not SD[X] plus SD[Y].

Why It Matters and How to Report It

Expected value and variance are foundational to statistical inference: hypothesis tests, confidence intervals, and regression coefficients are all built on these two moments. In research reports, mean and standard deviation should be presented together; the variance itself is reported only in ANOVA tables or other technical contexts. The unit of measurement must always be stated, and when comparing multiple groups each group's mean and standard deviation should be reported separately. The notation M = value and SD = value in table format follows international publication standards and clearly communicates both location and spread to readers.