How does risk of bias differ from the evidence hierarchy?

The hierarchy ranks designs in general by their typical vulnerability to bias, whereas a risk-of-bias assessment evaluates how well one specific study was actually conducted, so a high-ranking design can still carry high risk of bias.

Why have numeric quality scores fallen out of favour?

Composite scores combine unrelated features into one number and hide which flaws drive the result; domain-based tools instead make a separate, transparent judgement for each potential source of bias.

Risk of Bias Assessment

Risk of bias assessment is the structured appraisal of how likely a study's design, conduct, and reporting are to have distorted its results away from the truth. Unlike an evidence hierarchy, which ranks designs in general, it judges an individual study, asking whether features such as how participants were allocated, blinded, retained, and analysed could have biased the estimated effect.

Znajdź temat z PaperMindWkrótceFind papers & topics

Tools & resources

Pobierz slajdy

Learn & explore

WideoWkrótce

Definition

Risk of bias assessment is a domain-based evaluation of the internal validity of an individual study, judging for each relevant domain whether flaws in design, conduct, or reporting are likely to have produced a systematic error in the estimated effect.

Scope

The entry covers the concept of bias as systematic error, the standard domains assessed in randomised and non-randomised studies, and the principal Cochrane tools used to make these judgements. It is a methodological reference on study-level appraisal, not clinical guidance.

Key concepts

Bias as systematic (not random) error
Internal validity
Selection bias / randomisation and allocation concealment
Performance and detection bias / blinding
Attrition bias / incomplete outcome data
Reporting bias / selective outcome reporting
Domain-based judgement (low / some concerns or unclear / high risk)
Confounding in non-randomised studies

Mechanisms

Assessment proceeds by domains, each capturing a route by which systematic error can enter. In randomised trials these include the randomisation process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result; for each, a reviewer judges the risk as low, of some concern (or unclear), or high, often guided by signalling questions and reaches an overall judgement. Non-randomised studies of interventions add confounding and participant selection as central domains, since without randomisation these are the dominant threats. The product is a transparent, reproducible appraisal that feeds into evidence synthesis and certainty rating rather than a single summary score.

Clinical relevance

Risk-of-bias judgements explain why two studies of the same question may be weighted differently and why a body of evidence may be downgraded for study limitations. They help readers see whether a result is likely to reflect a real effect or an artefact of how the study was run; the entry describes appraisal methodology and is not a basis for individual clinical decisions.

Evidence & guidelines

The Cochrane risk-of-bias tool (Higgins et al., 2011) standardised domain-based appraisal of randomised trials and was superseded by RoB 2 (Sterne et al., 2019), which restructured the domains and added signalling questions. ROBINS-I (Sterne et al., 2016) extended the approach to non-randomised studies of interventions, emphasising confounding and selection. In GRADE, study-level risk of bias is the first factor that can lower the certainty of a body of evidence (Guyatt et al., 2008).

History

Quality scoring of trials in the 1980s and 1990s relied on numeric scales whose components and weights varied widely. The Cochrane Collaboration shifted appraisal toward explicit, domain-based judgement with its 2011 risk-of-bias tool, prioritising transparency over summary scores. RoB 2 (2019) refined the randomised-trial domains and introduced signalling questions, while ROBINS-I (2016) brought a parallel, confounding-centred framework to non-randomised studies.

Debates

Domain-based judgement versus numeric quality scores: Composite quality scores can obscure which specific flaws matter and how heavily, so modern tools favour transparent per-domain judgements; critics note that domain judgements still require subjective calls and can vary between assessors.

Key figures

Julian Higgins
Jonathan Sterne
Douglas Altman
Miguel Hernan

Seminal works

higgins-2011-robtool
sterne-2019-rob2
sterne-2016-robinsi

Frequently asked questions

How does risk of bias differ from the evidence hierarchy?: The hierarchy ranks designs in general by their typical vulnerability to bias, whereas a risk-of-bias assessment evaluates how well one specific study was actually conducted, so a high-ranking design can still carry high risk of bias.
Why have numeric quality scores fallen out of favour?: Composite scores combine unrelated features into one number and hide which flaws drive the result; domain-based tools instead make a separate, transparent judgement for each potential source of bias.