ScholarGate
सहायक

Simple Linear Regression

Simple linear regression models the expected value of a continuous outcome as a straight-line function of a single explanatory variable. It estimates an intercept and a slope by least squares, where the slope expresses how much the outcome changes, on average, for each one-unit increase in the predictor. It is the foundational regression model from which more elaborate models are built.

Definition

Simple linear regression fits the model E(Y) = a + bX, estimating the intercept a and slope b by minimising the sum of squared residuals (ordinary least squares), so that the slope quantifies the average change in the continuous outcome Y per one-unit increase in the single predictor X.

Scope

This entry covers the straight-line model with one predictor: the meaning of the intercept and slope, least-squares estimation, the assumptions of linearity, independence, constant variance, and approximately normal residuals, and the interpretation of the fit through confidence intervals, prediction, and the coefficient of determination. It is a methodological topic, not clinical guidance.

Core questions

  • How is a straight line fitted to data, and what does 'least squares' minimise?
  • What do the intercept and slope mean substantively?
  • What assumptions must hold for the estimates and their confidence intervals to be valid?
  • How does simple linear regression relate to the correlation coefficient?
  • How is the fitted line used for estimation versus prediction?

Key concepts

  • Intercept and slope
  • Ordinary least squares
  • Residuals
  • Assumptions: linearity, independence, constant variance, normal errors
  • Confidence interval for the slope
  • Coefficient of determination (R-squared)
  • Confidence versus prediction intervals
  • Regression toward the mean

Mechanisms

The model posits that the mean of the outcome lies on a straight line in the predictor, with individual observations scattered around that line. Ordinary least squares chooses the intercept and slope that minimise the sum of squared vertical distances (residuals) between observed and fitted values. The slope estimate has a standard error from which a confidence interval and hypothesis test follow, valid when the residuals are independent, have roughly constant variance, and are approximately normally distributed. The coefficient of determination, R-squared, reports the proportion of outcome variance explained by the predictor and equals the square of the Pearson correlation in the simple-predictor case. A confidence interval describes uncertainty in the mean outcome at a given predictor value, whereas a prediction interval, which is wider, describes uncertainty in an individual future observation.

Clinical relevance

Simple linear regression appears throughout the health literature to describe how one continuous measurement relates to another and to construct reference relationships and calibration lines. Recognising its assumptions is part of appraising such analyses. This entry describes the method and is not a basis for individual diagnostic or treatment decisions.

Evidence & guidelines

Standard medical-statistics texts and the BMJ Statistics Notes series describe how regression lines, slopes, and their confidence intervals should be reported and interpreted, and emphasise checking residuals before relying on a fitted line.

History

The straight-line model traces back to Francis Galton's nineteenth-century observation of 'regression toward the mean' in heritable traits, the phenomenon that gave regression its name, and to the least-squares method developed earlier in astronomy and geodesy. Pearson and successors formalised inference for the slope, and the model became the entry point for the broader regression apparatus of modern biostatistics.

Key figures

  • Francis Galton
  • Karl Pearson
  • Douglas Altman
  • Martin Bland

Related topics

Seminal works

  • altman-1991
  • kutner-2005

Frequently asked questions

What does the slope in a simple linear regression mean?
The slope is the average change in the outcome for each one-unit increase in the predictor. Its confidence interval and p-value indicate how precisely it is estimated and whether the association is distinguishable from no relationship.
What is the difference between a confidence interval and a prediction interval for a regression line?
A confidence interval expresses uncertainty about the mean outcome at a given predictor value, while a prediction interval, which is wider, expresses uncertainty about an individual new observation at that value because it also includes the scatter of points around the line.

Methods for this concept

Related concepts