Sensitivity and Specificity

How well a test catches positives and negatives

Sensitivity is the proportion of true positives a test correctly identifies — how many people with the condition receive a positive result. Specificity is the proportion of true negatives correctly identified — how many people without the condition receive a negative result. A highly sensitive test rarely misses cases; a highly specific test rarely raises false alarms. There is an inherent trade-off between these two measures, controlled by the decision threshold, and researchers must consider which type of error carries the greater practical cost.

Definition and Formulas

Sensitivity (Se) and specificity (Sp) measure a test's accuracy along two distinct dimensions. The four core cells of a confusion matrix are: true positive (TP), false negative (FN), false positive (FP), and true negative (TN). Sensitivity is calculated as: Se = TP / (TP + FN). This is the proportion of truly positive individuals the test correctly labels as positive. Specificity is: Sp = TN / (TN + FP). This is the proportion of truly negative individuals the test correctly labels as negative. Both values range from 0 to 1, where 1.0 indicates perfect performance.

Decision Threshold and Trade-off

The trade-off between sensitivity and specificity is controlled by the decision threshold. Lowering the threshold classifies more individuals as positive, which raises sensitivity but lowers specificity. Raising the threshold has the opposite effect. The ROC curve is the standard way to visualize this trade-off: the horizontal axis shows 1 minus specificity (false positive rate) and the vertical axis shows sensitivity. The area under the curve (AUC) summarizes overall discriminative performance. The choice of threshold should be guided by context: whether false negatives or false positives carry greater practical or ethical consequences.

Common Misconceptions

A common misunderstanding is that sensitivity and specificity are simply opposites of each other; in fact they are independent measures applied to different subgroups. Another frequent error is confusing them with positive or negative predictive value (PPV or NPV). While sensitivity and specificity are independent of disease prevalence, PPV and NPV vary with prevalence. It is also misleading to assume that high sensitivity implies high accuracy: a test that labels everyone as positive achieves 100% sensitivity but zero specificity. Finally, these concepts are not exclusive to medical diagnosis; they apply equally to classification models in machine learning and the social sciences.

Reporting and Research Importance

When reporting sensitivity and specificity, always present both together; reporting only one can be misleading. Include confidence intervals and demonstrate that the sample size is adequate. For screening tests, high sensitivity is the priority because the goal is to avoid missing cases; for confirmatory tests, high specificity matters more. Pre-specify the threshold and justify it on contextual grounds. The STARD guidelines describe how these measures should be reported in diagnostic accuracy studies. To evaluate the practical utility of a test, sensitivity and specificity should always be interpreted alongside a prevalence estimate.

Sources

Altman, D. G., & Bland, J. M. (1994). Diagnostic tests 1: sensitivity and specificity. BMJ, 308(6943), 1552. DOI: 10.1136/bmj.308.6943.1552 ↗