Type I and Type II Errors
False positives, false negatives and power
In statistical hypothesis testing, two fundamental error types exist. A Type I error (α) occurs when a true null hypothesis is rejected — also called a false positive. A Type II error (β) occurs when a false null hypothesis is not rejected — a false negative. Statistical power (1 − β) is the probability of detecting a true effect. A trade-off governs these errors: reducing α increases β and lowers power, while larger samples and larger effect sizes raise power.
Core Concepts and Definitions
Within the Neyman–Pearson framework, a hypothesis test is exposed to two possible errors. A Type I error occurs when the null hypothesis (H₀) is rejected even though it is true. Its probability is denoted α and is conventionally set at 0.05 or 0.01. A Type II error occurs when a false null hypothesis is not rejected; its probability is denoted β. Statistical power is the probability of correctly detecting a true effect, defined as 1 − β. The four decision outcomes — correct rejection, correct retention, Type I error, and Type II error — can be arranged in a 2 × 2 table.
The Trade-off Among α, β, and Power
There is an inescapable trade-off between α and β: at a fixed sample size, lowering the α threshold raises β and reduces power. The only way to reduce both errors simultaneously is to increase sample size. Power is determined by sample size (n), effect size (δ), and the α level; increasing any of these raises power. For example, the standard error is SE = σ/√n; as n grows, SE shrinks, making a true effect easier to detect and thus raising power. A priori power analysis determines the minimum sample size needed to achieve a desired power level before data collection begins.
Common Misunderstandings
Misinterpretation of the p-value is the most common source of confusion about Type I and Type II errors. The p-value is the probability of observing a result at least as extreme as the one obtained, assuming H₀ is true; it does not measure effect size, practical importance, or replicability. Failing to reject the null does not prove H₀ is true — it only means the data do not provide sufficient evidence against it. Equally, labeling any finding that fails to cross α = 0.05 as 'null' is erroneous: in an underpowered study, the Type II error rate can be very high and a real effect may be missed. 'Statistically significant' and 'practically important' are distinct concepts.
Importance in Research Practice
The asymmetric costs of the two error types influence how α and power targets should be set in a given research domain. In medical diagnostics, the cost of a Type II error (missing a disease) can be severe, so high power is prioritized. Conversely, minimizing Type I errors is critical to prevent erroneous drug approvals. In the social sciences, the prevalence of underpowered studies is recognized as a key driver of the replication crisis. Current reporting standards call for effect sizes and confidence intervals alongside p-values; pre-registered study protocols explicitly address both Type I and Type II error risks as part of transparent research design.
Key thinkers
- Jerzy Neyman (1894–1981)Polish mathematician and statistician who, together with Egon Pearson, developed the Neyman–Pearson framework for hypothesis testing.
- Egon Pearson (1895–1980)British statistician who built on his father Karl Pearson's legacy and, through his collaboration with Neyman, laid the foundations of modern statistical decision theory.
Sources
- Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A, 231, 289–337. DOI: 10.1098/rsta.1933.0009 ↗