ScholarGate
Assistant

Cox Regression Models

Cox regression — the proportional hazards model — is the most widely used method for relating one or more covariates to the rate of a time-to-event outcome. Its key innovation is that it estimates how covariates multiply the hazard without requiring any assumption about the shape of the underlying (baseline) hazard, yielding interpretable hazard ratios while correctly handling censoring.

Definition

The Cox proportional hazards model expresses a subject's hazard as an unspecified baseline hazard multiplied by the exponential of a linear combination of covariates, with regression coefficients estimated by maximising a partial likelihood that depends only on the ordering of event times.

Scope

This topic covers the structure of the Cox model, the partial likelihood that allows estimation without specifying the baseline hazard, the interpretation of hazard ratios, and the assumptions and diagnostics on which valid use depends. It is methodological reference material and does not constitute clinical guidance.

Core questions

  • How does the Cox model relate covariates to the hazard without specifying its baseline shape?
  • What is partial likelihood and why does it permit estimation from censored, ordered event times?
  • How is a hazard ratio interpreted, and what are its limits?
  • What assumptions and diagnostics govern valid use of the model?

Key concepts

  • Baseline hazard (unspecified)
  • Hazard ratio
  • Partial likelihood
  • Risk set and event ordering
  • Semiparametric model
  • Proportional hazards assumption
  • Tied event times
  • Time-varying covariates

Mechanisms

The model writes the hazard for a subject as a product of an arbitrary baseline hazard, common to all subjects, and a factor exp(beta'x) that scales it according to that subject's covariates. Cox's central insight was the partial likelihood: at each event time the contribution is the probability that the subject who actually had the event was the one to fail among all those still at risk, which depends only on covariates and the composition of the risk set, not on the baseline hazard's form. Maximising the product of these contributions gives coefficient estimates, and exponentiating a coefficient yields a hazard ratio — the multiplicative change in the event rate per unit of that covariate. Because the baseline hazard is left free, the model is semiparametric; censored subjects contribute to risk sets up to their censoring time. Valid inference rests on the proportional hazards assumption, checked with residual-based diagnostics (Cox, 1972; Schoenfeld, 1982; Bradburn et al., 2003).

Clinical relevance

Most adjusted estimates of prognostic factors and treatment effects on survival in the clinical literature come from Cox models reported as hazard ratios; understanding the model supports appraisal of those estimates, including whether confounders were addressed and assumptions checked. The entry is descriptive of methodology and not a basis for individual clinical decisions.

Epidemiology

Cox regression is the default multivariable method for time-to-event outcomes across clinical and epidemiologic research; Cox's 1972 paper is one of the most cited statistical papers ever published, reflecting near-universal adoption (Cox, 1972).

Evidence & guidelines

There are no clinical guidelines for the model itself; the methodological references are Cox's 1972 paper, diagnostic developments based on partial residuals (Schoenfeld, 1982), and texts covering extensions and good practice (Therneau & Grambsch, 2000; Collett, 2015), alongside tutorials for medical audiences (Bradburn et al., 2003).

History

Cox introduced the proportional hazards model and the partial likelihood in his 1972 paper, which transformed survival analysis by allowing covariate-adjusted regression without committing to a parametric baseline hazard. The justification of partial likelihood as a basis for inference, and a body of diagnostics and extensions (stratification, time-varying covariates, residual checks), followed over subsequent decades (Schoenfeld, 1982; Therneau & Grambsch, 2000).

Debates

How should tied event times be handled?
When several events share an event time the partial likelihood must be approximated, and methods (Breslow, Efron, exact) differ; the choice rarely changes conclusions but matters with heavy ties and is a standard implementation decision.

Key figures

  • David R. Cox
  • David Schoenfeld
  • Terry Therneau
  • Patricia Grambsch

Related topics

Seminal works

  • cox-1972

Frequently asked questions

Why is the Cox model called semiparametric?
It models the covariate effect parametrically through exp(beta'x) but leaves the baseline hazard completely unspecified, so it combines a parametric regression part with a nonparametric baseline.
What does a hazard ratio of 2 from a Cox model mean?
It means the model estimates the instantaneous event rate to be twice as high for the compared group or per unit increase in the covariate, assuming that ratio is constant over follow-up (the proportional hazards assumption).

Methods for this concept

Related concepts