What information do I need before I can calculate a sample size?

At minimum the significance level (often 0.05), the desired power (often 80% or 90%), the smallest effect worth detecting, and an estimate of the outcome's variability or baseline event rate; for planning you also add an allowance for expected dropout.

Why can a study be too large as well as too small?

An underpowered study may miss a real effect, but an unnecessarily large one exposes additional participants to study procedures and consumes resources without improving the answer, so the goal is an appropriate size, not simply a big one.

Sample Size Calculation

Sample size calculation is the procedure for determining how many participants a study needs to answer its question with acceptable reliability. By combining a target effect size, an accepted false-positive rate, a desired power, and the expected variability of the outcome, it yields the number of subjects required so that a real effect is likely to be detected and a chance finding is unlikely to be mistaken for one. It is a planning step that turns a research question into a concrete recruitment target.

Definition

A sample size calculation determines the number of study participants needed to detect a prespecified effect size with a chosen statistical power (typically 80% or 90%) at a chosen significance level (commonly two-sided 0.05), given the expected variability of the outcome.

Scope

The entry covers the logic and ingredients of a sample size calculation, the roles of significance level and power, the influence of effect size and outcome variability, and adjustments for anticipated dropout. It treats sample size as a methodological planning topic within study design, including its use in trials and observational studies, and does not give numeric formulas as clinical instructions.

Key concepts

Significance level (alpha) and type I error
Statistical power and type II error (beta)
Effect size and minimal clinically important difference
Outcome variability (variance or event rate)
Allocation ratio between groups
Inflation for anticipated attrition
Pilot and feasibility sample sizes

Mechanisms

A calculation links four quantities so that fixing any three determines the fourth: the significance level, the power, the effect size to be detected, and the variability of the outcome. Smaller target effects, greater outcome variability, higher power, and stricter significance levels all increase the required number of subjects. For continuous outcomes the relevant variability is the standard deviation; for binary outcomes it is the event rates in each group. The planned number is then inflated to offset expected losses to follow-up so that the analyzed sample retains adequate power. Pilot studies estimate feasibility and variability rather than effect size, and use separate sizing rules.

Clinical relevance

An adequately sized study is more likely to give a trustworthy answer, while an underpowered one risks missing real effects and a needlessly large one exposes extra participants without benefit; appraising whether a study was appropriately sized is therefore part of judging its evidence. This entry describes a research-planning method and is not a basis for individual clinical decisions.

Evidence & guidelines

Reporting standards require that the sample size and the assumptions behind it be stated: CONSORT 2010 asks trials to report how the sample size was determined, including the targeted effect, power, and significance level. Methodological reviews note that reported calculations are often incompletely justified, and dedicated work on pilot and feasibility studies (for example rule-of-thumb and confidence-interval approaches) addresses how to size early-phase studies whose purpose is estimation rather than hypothesis testing.

History

Sample size reasoning became routine as the Neyman-Pearson framework, with its explicit type I and type II error rates, was adopted in the mid-twentieth century, giving power a formal role in planning. Standard medical-statistics texts in the later twentieth century made the calculations accessible to clinical researchers, and reporting guidelines such as CONSORT later required that the calculation and its assumptions be disclosed. More recent work has refined how to size pilot and feasibility studies, distinguishing them from definitive trials.

Debates

How should the target effect size be chosen?: Calculations are sensitive to the assumed effect, and choosing an optimistically large effect to justify a small sample (sometimes called sample-size samba) can leave a study underpowered for a clinically meaningful difference; the effect should reflect the smallest difference worth detecting rather than what makes recruitment convenient.
How large should a pilot or feasibility study be?: Because pilots aim to assess feasibility and estimate variability rather than to test a hypothesis, they are sized by rules of thumb or precision-based rather than power-based reasoning, and the appropriate size remains an area of active methodological work.

Key figures

Kenneth Schulz
David Grimes
Douglas Altman
Steven Julious
Michael Campbell

Seminal works

schulz-grimes-2005-sampsize
moher-2010-consort-ss
altman-1991-textbook

Frequently asked questions

What information do I need before I can calculate a sample size?: At minimum the significance level (often 0.05), the desired power (often 80% or 90%), the smallest effect worth detecting, and an estimate of the outcome's variability or baseline event rate; for planning you also add an allowance for expected dropout.
Why can a study be too large as well as too small?: An underpowered study may miss a real effect, but an unnecessarily large one exposes additional participants to study procedures and consumes resources without improving the answer, so the goal is an appropriate size, not simply a big one.