What is the difference between reliability and validity?

Reliability is the consistency of a measurement (how little it is affected by random error), while validity is whether the inference drawn from a score is justified; a test can be reliable without being valid for a given purpose, but it cannot be valid without being reasonably reliable.

What does construct validity mean?

It is the degree to which a test can be interpreted as measuring an intended, theoretically defined attribute, established by accumulating evidence that the test relates to other variables as the theory predicts.

Psychological Testing and Psychometrics

Psychometrics is the science of measuring psychological attributes: how tests are constructed, how their scores are quantified, and how reliability, validity, and fairness are established so that a number derived from a test can be interpreted with confidence.

Find emne med PaperMindSnartFind papers & topics

Tools & resources

Hent slides

Learn & explore

VideoSnart

Definition

Psychometrics is the branch of psychology concerned with the theory and technique of psychological measurement, including the design, administration, scoring, and validation of tests and the statistical models that relate observed scores to underlying attributes.

Scope

This topic covers the theory and methods that turn responses into interpretable scores: classical test theory and the true-score model, reliability and measurement error, the validity framework, item-level analysis, norming and standardization, and test fairness. It is a methodological entry on measurement, not guidance on choosing or scoring tests for any individual.

Core questions

How much of an observed score reflects the attribute versus measurement error?
What evidence is needed before a score can be interpreted as measuring an intended construct?
How are test items analysed, selected, and scaled?
How are scores made comparable across people through norms and standardization?

Key concepts

True score and measurement error
Reliability (internal consistency, test-retest, inter-rater)
Content, criterion, and construct validity
Item analysis and difficulty/discrimination
Norms, standardization, and standard scores
Item response theory
Measurement invariance and test fairness

Key theories

Classical test theory: Classical test theory models an observed score as the sum of a true score and random error, from which reliability is defined as the proportion of observed-score variance attributable to true-score variance; Lord and Novick gave the field its rigorous statistical formulation.
Unified construct validity: Cronbach and Meehl framed validity around the construct a test infers, and Messick unified content, criterion, and construct evidence into a single argument about the justification and consequences of score interpretation.

Mechanisms

In the classical model an observed score is treated as a true score plus random error, and reliability quantifies the share of variance that is true-score variance; Lord and Novick formalized this and the later item response models. Validity is the warrant that a score supports an intended inference: Cronbach and Meehl located it in the construct and its nomological network, Haynes and colleagues detailed content validity as the systematic match of items to the target domain, and Messick unified the evidence types into an argument that also weighs the consequences of interpretation. Norms and standardization make scores comparable by referencing them to a defined population.

Clinical relevance

Psychometric properties determine whether a clinical test score can be trusted and what it may be taken to mean, so reliability and validity evidence underpin every defensible use of testing in clinical psychology. This entry explains those properties as measurement concepts; it does not recommend specific instruments or cutoffs for any person.

Evidence & guidelines

The Standards for Educational and Psychological Testing codify expectations for reliability, validity, and fairness in test development and use. Cronbach and Meehl, Messick, and Haynes and colleagues are standard methodological references for the validity framework, and Lord and Novick is the canonical statement of classical and item response test theory.

History

Mental measurement emerged from nineteenth-century work on individual differences and was systematized as classical test theory in the first half of the twentieth century. Cronbach and Meehl's 1955 paper made construct validity central, Lord and Novick's 1968 monograph gave the field a rigorous statistical and item response foundation, and Messick's later synthesis unified the validity concept around the justification of inferences and their social consequences.

Debates

Is validity a property of tests or of inferences?: The field largely moved from speaking of valid tests to validating the inferences and uses drawn from scores, with continued discussion about how far consequences of testing belong inside the validity concept.

Key figures

Lee Cronbach
Paul Meehl
Samuel Messick
Frederic Lord
Melvin Novick

Seminal works

cronbach-meehl-1955
lord-novick-1968
messick-1995

Frequently asked questions

What is the difference between reliability and validity?: Reliability is the consistency of a measurement (how little it is affected by random error), while validity is whether the inference drawn from a score is justified; a test can be reliable without being valid for a given purpose, but it cannot be valid without being reasonably reliable.
What does construct validity mean?: It is the degree to which a test can be interpreted as measuring an intended, theoretically defined attribute, established by accumulating evidence that the test relates to other variables as the theory predicts.