Validity of Measurement

Content, criterion and construct validity

Measurement validity asks whether an instrument truly measures what it claims to measure. Validity is organized around three main types: content validity evaluates whether the instrument covers the full domain of the construct; criterion validity compares scores against a gold standard either concurrently or predictively; construct validity examines whether the measure behaves as theory predicts, drawing on convergent and discriminant evidence. Face validity is regarded as weak because it reflects only a surface-level judgment with no empirical grounding.

What Is Validity?

Validity refers to the degree to which a measurement instrument accurately reflects the property it is intended to measure. A reliable instrument is consistent, but consistency alone is insufficient: you can consistently measure the wrong thing. Validity therefore represents a quality criterion that goes beyond reliability. Importantly, validity is not a fixed property that an instrument either has or lacks; it is the totality of evidence collected for a specific purpose and population. Researchers therefore do not 'prove' validity but rather accumulate supporting evidence over time.

Main Types of Validity

Content validity evaluates whether the instrument represents all relevant dimensions of the construct; it is examined through systematic approaches such as expert panels and content validity ratios. Criterion validity compares scores against an accepted gold standard: in concurrent criterion validity both measurements are taken at the same time, while in predictive criterion validity the instrument should forecast a future outcome. Construct validity is the most comprehensive type; it requires convergent evidence (high correlations with theoretically related variables) alongside discriminant evidence (low correlations with unrelated variables). Face validity merely reflects whether an instrument appears intuitively meaningful and carries little scientific weight.

Validity in Practice: An Example

Suppose a researcher is developing a scale to measure academic self-efficacy. For content validity, experts independently rate each item for its coverage of self-efficacy theory dimensions. For criterion validity, the scale scores are compared with an established self-efficacy instrument collected at the same time (concurrent), or correlated with subsequent academic achievement (predictive). For construct validity, the researcher examines whether the new scale correlates strongly with motivation and self-regulation scales (convergent) and shows the expected, lower correlation with anxiety scales (discriminant). Presenting all three forms of evidence together provides robust support for the scale's validity.

Common Pitfalls and Good Practice

The most common mistake is presenting face validity or an internal consistency coefficient (Cronbach's alpha) alone as evidence of validity; alpha is an indicator of reliability, not validity. Another frequent error is overlooking the fact that validity resides in the use of the instrument, not in the instrument itself: the same scale may require different validity evidence for different populations. Good practice involves gathering evidence from multiple validity types, clearly describing sample characteristics, and reporting validity evidence together with its limitations. Confirmatory factor analysis is currently among the most widely used methods for examining construct validity.

Key terms

Content Validity: Degree to which an instrument covers all dimensions of the targeted construct.
Criterion Validity: Concurrent or predictive relationship of scores with an accepted gold-standard measure.
Construct Validity: Evidence that the instrument behaves consistently with theory through convergent and discriminant findings.
Convergent Evidence: Observed high correlations with theoretically related constructs as expected.
Face Validity: Superficial appearance of meaningfulness in an instrument; weak scientific evidence.