ScholarGate
Asistent

Statistical Hypothesis Testing

Hypothesis testing is the theory of deciding between competing statements about a population from data, while controlling the chance of each kind of error.

Pronađite temu uz PaperMindUskoroFind papers & topics
Tools & resources
Preuzmi prezentaciju
Learn & explore
VideoUskoro

Definition

A statistical hypothesis test is a rule that uses sample data to decide whether to reject a null hypothesis in favor of an alternative, designed so that the probability of wrongly rejecting a true null is bounded by a chosen significance level.

Scope

This area covers the formulation of null and alternative hypotheses, the two types of error and the size and power of a test, the Neyman-Pearson lemma for the most powerful test of simple hypotheses, monotone likelihood ratio and uniformly most powerful tests, unbiased and invariant tests, the likelihood-ratio test and its large-sample chi-squared distribution, p-values and their interpretation, and the problem of testing many hypotheses at once.

Sub-topics

Core questions

  • How are the size and power of a test defined, and how are the two types of error traded off?
  • What test is most powerful for deciding between two simple hypotheses?
  • When does a uniformly most powerful test exist for a one-sided alternative?
  • How should significance be controlled when many hypotheses are tested simultaneously?

Key theories

Neyman-Pearson lemma
Among all tests of a given size for two simple hypotheses, the likelihood-ratio test that rejects when the ratio exceeds a threshold is most powerful.
Uniformly most powerful and unbiased tests
For families with monotone likelihood ratio a single test is most powerful against every alternative on one side; when no such test exists, optimality is sought within the unbiased or invariant classes.
Likelihood-ratio tests
The generalized likelihood-ratio statistic compares the maximized likelihoods under the null and the full model; under regularity its logarithm is asymptotically chi-squared, giving a general-purpose test.

Clinical relevance

Hypothesis tests underpin the evaluation of clinical trials, A/B testing, quality control, and signal detection, where controlling false-positive rates and ensuring adequate power directly affect which interventions, products, or discoveries are accepted as real.

History

Fisher developed significance testing and p-values in the 1920s. Neyman and Pearson introduced the decision-theoretic framework of two hypotheses, errors, and power in 1933, and Lehmann's mid-century work, continued with Romano, organized the optimality theory of tests.

Debates

Fisherian significance versus Neyman-Pearson decisions
Fisher viewed the p-value as a continuous measure of evidence against the null, while Neyman and Pearson framed testing as a decision with fixed error rates; the two philosophies are often blended in practice and the difference remains contested.

Key figures

  • Jerzy Neyman
  • Egon Pearson
  • Ronald A. Fisher
  • Erich L. Lehmann

Related topics

Seminal works

  • lehmannRomano2005

Frequently asked questions

What is the difference between a Type I and a Type II error?
A Type I error rejects a true null hypothesis, a false positive; a Type II error fails to reject a false null, a false negative. The significance level bounds the first and power equals one minus the probability of the second.
Does a small p-value prove the alternative hypothesis?
No. A small p-value indicates the data would be unlikely under the null; it is evidence against the null, not a probability that the null is false, and it does not by itself establish practical importance.

Methods for this concept

Related concepts