Hypothesis: Null and Alternative

The logic of statistical hypothesis testing

Statistical hypothesis testing is an inferential framework for evaluating testable claims about a population using data. The null hypothesis (H₀) asserts "no effect"; the alternative hypothesis (H₁) asserts there is one. The researcher assumes H₀, measures how surprising the observed data are under that assumption, and either rejects H₀ or fails to reject it. "Failing to reject" is never equivalent to "accepting" H₀, and H₁ is never proven. Fisher's significance testing and the Neyman-Pearson decision framework rest on distinct philosophical foundations.

Core Concepts and Definitions

A statistical hypothesis is a proposition about a population parameter stated in advance, in a form testable with data. The null hypothesis (H₀) typically carries a claim of no effect — for example, "there is no mean difference between groups" or "the correlation is zero". The alternative hypothesis (H₁ or Hₐ) expresses the effect or difference the researcher expects to detect. The test measures how probable the observed data are under the assumption that H₀ is true, summarised as a p-value. The p-value is the probability of obtaining results at least as extreme as observed, given H₀ is true; the smaller it is, the less compatible the data are with H₀.

How It Works: Directionality and Two Competing Frameworks

Hypotheses may be directional (one-tailed) or non-directional (two-tailed). A non-directional test states H₁: μ₁ ≠ μ₂, while a directional test states H₁: μ₁ > μ₂. In Fisher's framework only H₀ is specified; the p-value reflects the degree to which the data are inconsistent with H₀, and evidence accumulates continuously across studies. In the Neyman-Pearson decision framework both H₀ and H₁ are specified before data collection; α (Type I error rate) and β (Type II error rate) are set in advance, and the test yields a binary decision — "reject" or "do not reject" H₀. These two approaches are philosophically distinct yet are routinely and incorrectly blended in practice.

Common Misconceptions

The most frequent error is interpreting "failed to reject H₀" as "H₀ is true"; the test may simply have lacked sufficient statistical power due to a small sample. Conversely, rejecting H₀ does not prove H₁ — it provides probabilistic evidence against H₀. The p-value carries no information about effect size: with a very large sample, a practically trivial difference can reach statistical significance. The p < 0.05 threshold is also a convention, not a law; Fisher never prescribed it as a rigid rule. Reporting effect sizes (Cohen's d, η², r) and confidence intervals alongside p-values provides a far richer and more honest account of the data.

Importance in Research Practice

Understanding the logic of hypothesis testing is essential for both study design and the interpretation of findings. Power analysis uses the expected distribution under H₀ together with pre-specified α and β to determine the sample size needed to detect a meaningful effect. Hypotheses should be registered before data collection; hypotheses generated after seeing the data are exploratory and cannot legitimately be presented as confirmatory. The reproducibility crisis affecting many fields stems in part from the misapplication of hypothesis testing: p-hacking, HARKing (hypothesising after results are known), and selective reporting are primary culprits. Adhering to pre-specified α and β within the Neyman-Pearson framework keeps long-run error rates under control.

Key thinkers

  • Jerzy Neyman (1894–1981)Together with Egon Pearson, developed the decision-theoretic hypothesis testing framework and introduced the concepts of Type I and Type II errors to statistics.
  • Egon Pearson (1895–1980)In the framework co-developed with Neyman, argued that both H₀ and H₁ must be specified before testing, establishing the alternative hypothesis as a formal component of statistical inference.
  • Ronald A. Fisher (1890–1962)Pioneer of significance testing who defined the p-value and advocated testing H₀ alone; explicitly criticized the Neyman-Pearson binary decision approach as philosophically misguided.

Sources

  1. Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A, 231, 289–337. DOI: 10.1098/rsta.1933.0009