The Odds Ratio
A measure of association for categorical outcomes
The odds ratio (OR) compares the odds of an outcome between two groups. The odds in the exposed group are divided by the odds in the unexposed group. OR = 1 indicates no association; values above 1 indicate higher odds with exposure. It is the natural output of logistic regression and case-control studies. A frequent misconception treats it as equivalent to relative risk, which holds only when the outcome is rare.
Concept and Formula
The odds of an event is the ratio of the probability that it occurs to the probability that it does not: odds = p / (1 - p). The odds ratio is the division of this value between two groups: OR = odds_exposed / odds_unexposed. From a 2x2 table it is calculated as OR = (a x d) / (b x c), where a and d are the concordant cells and b and c are the cross cells. OR = 1 means no association, OR > 1 means higher odds with exposure, and OR < 1 indicates a protective association.
How to Read and Report It
OR should always be reported with a confidence interval and p-value, for example OR = 2.4, 95 CI [1.1, 5.2]. If the confidence interval includes 1, the result is not statistically significant. When interpreting OR, discuss direction and magnitude rather than fixating on the exact number. In logistic regression output, coefficients are in log-odds; apply exp to convert them to OR. Always state the reference category clearly, since the OR is defined relative to that group.
Common Misconceptions
The most common error is treating OR as equivalent to relative risk (RR). OR only approximates RR when the outcome is rare, roughly 10 percent or less. When the outcome is common, OR is always more extreme than RR, exaggerating the apparent effect size. In case-control studies RR cannot be calculated directly, but OR can. Another misconception is reading OR = 2.0 as meaning twice as many people develop the outcome; that statement describes odds, not probability.
When to Use It and Why It Matters
OR emerges naturally from the mathematics of logistic regression and is the standard measure for binary outcomes. In case-control designs, because cases and controls are selected by the researcher, only OR can be meaningfully interpreted. It is also the primary input for meta-analyses in clinical and epidemiological research. When the outcome is common in cohort or experimental studies, RR or risk difference is preferred; however, when the model is logistic regression, correctly interpreting OR is unavoidable.
Sources
- Rothman, K. J., Greenland, S., & Lash, T. L. (2008). Modern Epidemiology (3rd ed.). Lippincott Williams & Wilkins. ISBN: 978-0-7817-5564-1