What is statistical power in simple terms?

It is the chance that a study will detect a real effect of a given size if that effect genuinely exists. Higher power means a better chance of not missing a true effect; 80% power is a common target.

Why does sample size matter so much?

Larger samples increase power and narrow the precision of estimates, so a study can reliably detect the effect it is looking for. Too small a sample risks both missing real effects and producing exaggerated significant findings.

Statistical Power and Sample Size

Statistical power is the probability that a study will detect an effect of a given size when that effect truly exists - formally, one minus the Type II error rate. Sample size determination is the planning step that chooses how many participants are needed to achieve a target power, given the expected effect size, the chosen significance level, and the variability of the data. Together they decide whether a study is large enough to give its question a fair chance of an answer.

Definition

Statistical power is the probability that a test correctly rejects a false null hypothesis (detects a real effect of specified size); sample size determination is the calculation of the number of observations required to achieve a target power at a given significance level for an assumed effect size and variability.

Scope

This topic explains what power means, the four interlocking quantities of a power calculation (effect size, significance level, power, and sample size), and the consequences of underpowered research. It is presented as a reference methodology for planning and appraising studies, not as a clinical decision rule.

Core questions

How likely is the study to detect the effect it is looking for?
How many participants are needed to reach a target power?
How do effect size, variability, and significance level drive sample size?
What goes wrong when a study is underpowered?

Key concepts

Statistical power (1 minus beta)
Effect size
Significance level (alpha)
Variability and standard deviation
A priori sample size calculation
Underpowered study
Minimum clinically important difference

Mechanisms

Power, significance level, effect size, and sample size are linked so that fixing any three determines the fourth. For a given significance level, power rises as the true effect size grows, as variability falls, and as the sample size increases. Sample size calculation inverts this relationship: starting from an assumed effect size (often a minimum worth detecting), a chosen significance level, and a target power - conventionally 80% or 90% - it solves for the number of observations needed. Underpowering not only raises the chance of missing real effects (Type II error) but also makes any significant findings more likely to be exaggerated or false, because only large, possibly inflated estimates clear the threshold in a small study.

Clinical relevance

Whether a trial or study was adequately powered shapes how its results should be read: a non-significant result from an underpowered study is largely uninformative rather than reassuring, and prospectively justifying sample size is an expected element of study reporting. This entry describes power and sample-size reasoning for appraisal and design purposes and is not a basis for individual diagnostic or treatment decisions.

Evidence & guidelines

Reporting standards for clinical trials and observational studies expect an a priori sample size justification, and methodological reviews have documented the widespread harms of low power. Button and colleagues showed that chronically underpowered fields yield unreliable literatures, while Altman and Bland and the misinterpretation guide by Greenland and colleagues stress that low power explains many uninformative null results.

History

Power is a direct outgrowth of the Neyman-Pearson testing framework, which defined the Type II error rate whose complement power is. Jacob Cohen's work from the 1960s onward, consolidated in his 1988 monograph, popularised systematic power analysis and effect-size conventions across the health and behavioural sciences. Concern about underpowered research intensified in the reproducibility debates of the 2010s.

Debates

Consequences of chronic underpowering: Persistently low power not only inflates false negatives but also reduces the probability that a statistically significant finding reflects a true effect and exaggerates the size of those that are reported, undermining the reliability of whole literatures.

Key figures

Jacob Cohen
Jerzy Neyman
Egon Pearson
Douglas G. Altman
John P. A. Ioannidis

Seminal works

cohen-1988
button-2013

Frequently asked questions

What is statistical power in simple terms?: It is the chance that a study will detect a real effect of a given size if that effect genuinely exists. Higher power means a better chance of not missing a true effect; 80% power is a common target.
Why does sample size matter so much?: Larger samples increase power and narrow the precision of estimates, so a study can reliably detect the effect it is looking for. Too small a sample risks both missing real effects and producing exaggerated significant findings.