Multiple Hypothesis Testing
When many hypotheses are tested at once, false positives accumulate; multiple-testing procedures control the overall error at the cost of some power.
Definition
Multiple hypothesis testing is the simultaneous testing of several hypotheses with procedures that control a global error criterion, such as the probability of any false rejection or the expected proportion of false rejections.
Scope
This topic covers the multiplicity problem and the inflation of false positives, the family-wise error rate and procedures that control it such as Bonferroni, Holm, and Sidak, the false discovery rate and the Benjamini-Hochberg procedure, the distinction between weak and strong control, dependence among tests, and the trade-off between error control and detection power in large-scale testing.
Core questions
- Why does testing many hypotheses inflate the chance of at least one false positive?
- How do the Bonferroni and Holm procedures control the family-wise error rate?
- What is the false discovery rate, and how does the Benjamini-Hochberg procedure control it?
- How does dependence among tests affect these guarantees?
Key theories
- Family-wise error rate control
- The Bonferroni procedure bounds the probability of any false rejection by dividing the level among the tests; the Holm step-down procedure achieves the same control with greater power.
- False discovery rate and Benjamini-Hochberg
- Instead of preventing any false rejection, the false discovery rate controls the expected fraction of rejections that are false; the Benjamini-Hochberg step-up procedure controls it and is far more powerful for large numbers of tests.
Clinical relevance
Multiple-testing control is essential in genome-wide association studies, neuroimaging, and high-throughput screening, where thousands of hypotheses are tested at once and false-discovery-rate methods determine which findings are reported as discoveries.
History
Concern with multiple comparisons goes back to Tukey and the simultaneous-inference work of the mid-twentieth century. Holm introduced his step-down family-wise procedure in 1979, and Benjamini and Hochberg's 1995 false-discovery-rate paper transformed large-scale testing.
Debates
- Family-wise error rate versus false discovery rate
- Controlling the probability of any false rejection is conservative and costs power, while controlling the expected proportion of false discoveries is more powerful but tolerates some false positives; which criterion is appropriate depends on the cost of errors in the application.
Key figures
- Yoav Benjamini
- Yosef Hochberg
- Sture Holm
- John W. Tukey
Related topics
Seminal works
- benjaminiHochberg1995
Frequently asked questions
- Why not just use the usual significance level for each test?
- Because with many tests the chance that at least one true null is rejected grows quickly; for example, twenty independent tests at the five percent level give roughly a sixty-four percent chance of a false positive.
- Is the false discovery rate always better than Bonferroni?
- Not always. The false discovery rate gives more power with many tests but tolerates some false discoveries; when even one false positive is costly, family-wise control such as Bonferroni or Holm is preferred.