What is the difference between a Type I and a Type II error?

A Type I error rejects a true null hypothesis, a false positive; a Type II error fails to reject a false null, a false negative. The significance level bounds the first and power equals one minus the probability of the second.

Does a small p-value prove the alternative hypothesis?

No. A small p-value indicates the data would be unlikely under the null; it is evidence against the null, not a probability that the null is false, and it does not by itself establish practical importance.

Statistical Hypothesis Testing

Hypothesis testing is the theory of deciding between competing statements about a population from data, while controlling the chance of each kind of error.

Find emne med PaperMindSnartFind papers & topics

Tools & resources

Hent slides

Learn & explore

VideoSnart

Definition

A statistical hypothesis test is a rule that uses sample data to decide whether to reject a null hypothesis in favor of an alternative, designed so that the probability of wrongly rejecting a true null is bounded by a chosen significance level.

Scope

This area covers the formulation of null and alternative hypotheses, the two types of error and the size and power of a test, the Neyman-Pearson lemma for the most powerful test of simple hypotheses, monotone likelihood ratio and uniformly most powerful tests, unbiased and invariant tests, the likelihood-ratio test and its large-sample chi-squared distribution, p-values and their interpretation, and the problem of testing many hypotheses at once.

Sub-topics

Core questions

How are the size and power of a test defined, and how are the two types of error traded off?
What test is most powerful for deciding between two simple hypotheses?
When does a uniformly most powerful test exist for a one-sided alternative?
How should significance be controlled when many hypotheses are tested simultaneously?

Key theories

Neyman-Pearson lemma: Among all tests of a given size for two simple hypotheses, the likelihood-ratio test that rejects when the ratio exceeds a threshold is most powerful.
Uniformly most powerful and unbiased tests: For families with monotone likelihood ratio a single test is most powerful against every alternative on one side; when no such test exists, optimality is sought within the unbiased or invariant classes.
Likelihood-ratio tests: The generalized likelihood-ratio statistic compares the maximized likelihoods under the null and the full model; under regularity its logarithm is asymptotically chi-squared, giving a general-purpose test.

Clinical relevance

Hypothesis tests underpin the evaluation of clinical trials, A/B testing, quality control, and signal detection, where controlling false-positive rates and ensuring adequate power directly affect which interventions, products, or discoveries are accepted as real.

History

Fisher developed significance testing and p-values in the 1920s. Neyman and Pearson introduced the decision-theoretic framework of two hypotheses, errors, and power in 1933, and Lehmann's mid-century work, continued with Romano, organized the optimality theory of tests.

Debates

Fisherian significance versus Neyman-Pearson decisions: Fisher viewed the p-value as a continuous measure of evidence against the null, while Neyman and Pearson framed testing as a decision with fixed error rates; the two philosophies are often blended in practice and the difference remains contested.

Key figures

Jerzy Neyman
Egon Pearson
Ronald A. Fisher
Erich L. Lehmann

Seminal works

lehmannRomano2005

Frequently asked questions

What is the difference between a Type I and a Type II error?: A Type I error rejects a true null hypothesis, a false positive; a Type II error fails to reject a false null, a false negative. The significance level bounds the first and power equals one minus the probability of the second.
Does a small p-value prove the alternative hypothesis?: No. A small p-value indicates the data would be unlikely under the null; it is evidence against the null, not a probability that the null is false, and it does not by itself establish practical importance.