Can a measure be reliable but not valid?

Yes. A measure can give highly consistent results while consistently capturing the wrong thing. Reliability is necessary for validity but does not guarantee it; a measure must also be shown to reflect the construct it claims to assess.

Why correct for chance when assessing inter-rater agreement?

Two raters will agree sometimes purely by chance, especially with few categories. Cohen's kappa adjusts observed agreement for the agreement expected by chance, giving a more honest estimate of true reliability.

Measurement Validity and Reliability

Validity and reliability are the two core properties that determine whether a quality measure can be trusted. Validity concerns whether a measure actually captures the aspect of quality it claims to capture; reliability concerns whether the measure produces consistent results when applied repeatedly under the same conditions. A measure must be both reliable and valid before its results justify judgements about quality or decisions to act.

Cari Topik dengan PaperMindTidak lama lagiFind papers & topics

Tools & resources

Muat turun slaid

Learn & explore

VideoTidak lama lagi

Definition

Reliability is the degree to which a measure yields consistent, reproducible results across repeated applications, raters, or items; validity is the degree to which a measure accurately reflects the underlying construct, here an aspect of healthcare quality, that it is intended to assess.

Scope

This entry covers the principal forms of validity and reliability as they apply to quality indicators and instruments, the statistics commonly used to quantify them, and why both properties matter for measurement that carries consequences. It is a methodological reference and does not provide clinical scoring thresholds for any specific instrument.

Core questions

What does it mean for a quality measure to be valid, and how is validity assessed?
How is reliability distinguished from validity, and why are both necessary?
Which statistics quantify internal consistency and inter-rater agreement?
How do poor validity or reliability mislead judgements about quality?

Key concepts

Content validity
Construct validity
Criterion validity
Internal consistency (Cronbach's alpha)
Inter-rater reliability (Cohen's kappa)
Test-retest reliability
Measurement error and random variation

Key theories

Classical test theory of reliability: Classical test theory frames an observed measurement as the sum of a true value and random error, so reliability is the proportion of observed variance attributable to true differences rather than error. Cronbach's coefficient alpha operationalises one form of this as the internal consistency among items intended to measure the same construct.

Mechanisms

Reliability is assessed by examining the consistency of measurement across repetitions: internal consistency among items, agreement between raters, and stability over time when the underlying state has not changed. Cronbach's alpha summarises internal consistency for multi-item scales, while Cohen's kappa quantifies agreement between two raters on categorical judgements, correcting for chance agreement. Validity is assessed by accumulating evidence that the measure reflects its intended construct: content validity (comprehensive coverage of the concept), construct validity (expected relationships with other measures), and criterion validity (agreement with a reference standard). A measure can be reliable yet invalid, consistently measuring the wrong thing, but it cannot be valid without being reliable, because random error caps how well a measure can track its target.

Clinical relevance

Before a quality indicator or patient-reported instrument is used for reporting, accreditation, or incentives, its validity and reliability must be established so that observed differences reflect real variation in quality rather than measurement noise. These properties are central to interpreting any quality measurement programme. This entry explains measurement properties and is not a source of clinical scoring rules for individual patients.

Evidence & guidelines

The statistical foundations come from Cronbach's coefficient alpha and Cohen's kappa, with applied guidance for health measurement consolidated in Streiner and Norman's text. Indicator-classification guidance situates these properties within quality measurement. These sources are cited for their methodological content and are not clinical directives.

History

The concepts of validity and reliability were formalised within psychometrics in the mid-twentieth century, with Cronbach's 1951 alpha and Cohen's 1960 kappa becoming standard tools. As health care adopted patient-reported instruments and quality indicators, these psychometric principles were carried into healthcare measurement and codified in practical guides such as Streiner and Norman's.

Debates

Is Cronbach's alpha a sufficient measure of reliability?: Alpha is widely reported but depends on the number of items and assumes a single underlying dimension; high alpha can reflect redundancy rather than good measurement, and it does not establish unidimensionality or validity, prompting calls for complementary evidence.

Key figures

Lee Cronbach
Jacob Cohen
David Streiner
Geoffrey Norman

Seminal works

cronbach-1951
cohen-1960
streiner-norman-2015

Frequently asked questions

Can a measure be reliable but not valid?: Yes. A measure can give highly consistent results while consistently capturing the wrong thing. Reliability is necessary for validity but does not guarantee it; a measure must also be shown to reflect the construct it claims to assess.
Why correct for chance when assessing inter-rater agreement?: Two raters will agree sometimes purely by chance, especially with few categories. Cohen's kappa adjusts observed agreement for the agreement expected by chance, giving a more honest estimate of true reliability.