Evidence Evaluation and Critical Appraisal
Evidence evaluation and critical appraisal is the disciplined judgement of whether a study or body of evidence is valid, what its results mean, and whether they apply to a given question. It is the core analytic skill of evidence-based medicine, separating the reliability of evidence from the loudness of its claims.
Definition
Critical appraisal is the systematic process of examining research to judge its internal validity (freedom from bias), the size and precision of its results, and its external validity (applicability), so the trustworthiness of the evidence can be established before it is used.
Scope
This topic covers the hierarchy of evidence, the structured assessment of risk of bias in individual studies, the grading of the certainty of a body of evidence, and the judgement of applicability. It is a methodological and reference topic about how evidence is judged, not a source of treatment instructions.
Core questions
- Is the study's design and conduct free of important bias?
- What is the size and precision of the reported effect?
- How certain is the overall body of evidence?
- Do the results apply to the patients or question at hand?
Key concepts
- Internal validity and risk of bias
- External validity and applicability
- Hierarchy of evidence
- Certainty (quality) of evidence
- Effect size and precision
- Structured appraisal tools (RoB 2, AMSTAR 2)
Mechanisms
Appraisal proceeds from the individual study to the body of evidence. For a randomised trial, structured tools such as RoB 2 examine domains where bias can enter — randomisation, deviations from intended interventions, missing data, measurement, and selective reporting. For a systematic review, AMSTAR 2 assesses methodological quality. Across studies, the GRADE framework rates the certainty of a body of evidence as high, moderate, low, or very low, lowering it for risk of bias, inconsistency, indirectness, imprecision, and publication bias, and raising it for features such as large effects. This certainty rating then feeds the move from evidence to recommendation. Underlying all of this is the evidence-based medicine principle that external evidence must be appraised before it is integrated with clinical expertise.
Clinical relevance
Critical appraisal determines how much weight a piece of evidence should carry in formulary decisions, guideline development, and the answering of drug information questions. It is a reference skill for weighing the medicines literature and describes how evidence is judged; it does not itself direct individual diagnosis or therapy.
Evidence & guidelines
Several widely adopted instruments standardise appraisal: the Cochrane RoB 2 tool for risk of bias in randomised trials, AMSTAR 2 for the methodological quality of systematic reviews, and the GRADE framework for rating the certainty of a body of evidence and the strength of recommendations. These tools are maintained by their developer groups and updated as methods evolve.
History
Critical appraisal was formalised by the clinical epidemiology and evidence-based medicine movements of the 1980s and 1990s, with Sackett and colleagues articulating its principles. Structured instruments followed: the Cochrane risk-of-bias tools (revised as RoB 2), AMSTAR for systematic-review quality (revised as AMSTAR 2), and the GRADE approach to grading certainty, which together replaced informal judgement with explicit, reproducible criteria.
Key figures
- David Sackett
- Gordon Guyatt
- Jonathan Sterne
- Beverley Shea
Related topics
Seminal works
- sackett-1996
- guyatt-2008-grade
- sterne-2019-rob2
- shea-2017-amstar2
Frequently asked questions
- What is the difference between internal and external validity?
- Internal validity is whether a study's result is free of bias and reflects a true effect in its own sample; external validity is whether that result applies to other patients, settings, or questions.
- Does a higher place in the evidence hierarchy guarantee a better answer?
- No. Study design sets the potential strength of evidence, but a poorly conducted trial or review can still be biased, which is why structured risk-of-bias appraisal and certainty grading are applied to every study.