ScholarGate
Ассистент

Screening and Diagnostic Test Evaluation

Screening and diagnostic test evaluation is the branch of epidemiology that quantifies how well a test distinguishes people who have a target condition from those who do not. It supplies the measures — sensitivity, specificity, predictive values, likelihood ratios, and the receiver operating characteristic curve — used to judge a test against a reference standard and to anticipate how it will behave when applied to a population.

Найти тему в PaperMindСкороFind papers & topics
Tools & resources
Скачать слайды
Learn & explore
ВидеоСкоро

Definition

Screening and diagnostic test evaluation is the systematic measurement of a test's ability to classify subjects by true disease status, expressed through accuracy indices computed from a cross-tabulation of test results against a reference standard.

Scope

This area orients the reader to the core accuracy metrics derived from the comparison of a test against a reference ("gold") standard, the distinction between intrinsic test properties and population-dependent predictive performance, the role of disease prevalence, and the reporting standards for diagnostic accuracy studies. It is a methodological overview, not clinical guidance, and does not recommend any specific test or threshold for an individual.

Sub-topics

Core questions

  • How often does a test correctly identify people who have the condition, and people who do not?
  • Given a positive or negative result, how likely is the condition actually present or absent?
  • How does the prevalence of the condition in a population change the practical value of a test?
  • How should the trade-off between detecting true cases and avoiding false alarms be chosen and reported?

Key concepts

  • Reference (gold) standard
  • Sensitivity and specificity
  • Positive and negative predictive value
  • Likelihood ratios
  • Disease prevalence and pre-test probability
  • Receiver operating characteristic (ROC) curve
  • Diagnostic threshold and cut-off
  • Spectrum and verification bias

Mechanisms

Test evaluation begins by cross-classifying each subject's test result (positive or negative) against true disease status established by a reference standard, producing the four cells of a 2x2 table (true positives, false positives, false negatives, true negatives). Sensitivity and specificity are read down the columns of known disease status and are, in principle, properties of the test that do not depend on how common the condition is. Predictive values are read across the rows of test result and therefore depend on prevalence, because the same test applied where disease is rare yields more false positives relative to true positives. Likelihood ratios combine sensitivity and specificity into factors that update pre-test odds to post-test odds. When a test produces a continuous or ordinal measurement, moving the decision threshold trades sensitivity against specificity; plotting that trade-off across all thresholds yields the ROC curve, whose area summarises discrimination independently of any single cut-off.

Clinical relevance

These measures are the common language for appraising whether a screening or diagnostic test is fit for purpose and for comparing competing tests on equal terms. Understanding them is central to critical appraisal of the diagnostic literature; the area explains how diagnostic evidence is generated and interpreted and is not a basis for individual diagnostic or treatment decisions.

Epidemiology

Accuracy metrics underpin decisions about population screening programmes, where the consequences of false positives and false negatives at scale, together with disease prevalence, determine whether screening does more good than harm. Reporting standards such as STARD were developed to improve the completeness and transparency of diagnostic accuracy studies, and biases of spectrum and verification are recognised threats to the validity of reported accuracy.

Evidence & guidelines

The STARD statement provides a checklist for transparent reporting of diagnostic accuracy studies and is widely endorsed by biomedical journals.

History

Formal evaluation of diagnostic tests grew out of mid-twentieth-century work on signal detection and clinical decision making and was sharpened by recognition in the 1970s that biased study design could inflate apparent accuracy. The accessible accuracy measures were popularised in the medical literature through the 1990s, and reporting standards were consolidated in the STARD statement in the 2000s and updated in 2015.

Debates

Why can a highly accurate-sounding test still mislead in screening?
Because predictive values depend on prevalence, a test with high sensitivity and specificity can still generate many false positives when applied to a low-prevalence screening population, a recurring source of misinterpretation.
How much do study design biases distort reported accuracy?
Spectrum bias and verification bias can substantially inflate measured sensitivity and specificity, so reported accuracy must be interpreted in light of how cases and controls were selected and how the reference standard was applied.

Key figures

  • Douglas Altman
  • Jonathan Deeks
  • David Grimes
  • Kenneth Schulz
  • Patrick Bossuyt

Related topics

Seminal works

  • ransohoff-feinstein-1978
  • altman-bland-1994a
  • altman-bland-1994b
  • bossuyt-2015

Frequently asked questions

What is the difference between a screening test and a diagnostic test?
A screening test is applied to apparently healthy people to identify those more likely to have a condition, usually favouring sensitivity, while a diagnostic test is used to confirm or exclude disease in people already suspected of it; both are evaluated with the same accuracy measures against a reference standard.
Why does prevalence matter for a test's usefulness?
Sensitivity and specificity describe the test itself, but the chance that a positive result is correct (positive predictive value) falls as the condition becomes rarer, so the same test can be informative in a high-prevalence clinic and misleading in a low-prevalence screening setting.

Methods for this concept

Related concepts