ScholarGate
Assistant

Population Stratification and Human Genetic Diversity

Human genetic diversity is structured: allele frequencies vary in a patterned way across geography and ancestry, so that populations are differentiated rather than a single undivided pool. Population stratification refers to this structure and to the bias it can introduce into genetic studies when cases and controls differ systematically in ancestry.

Definition

Population stratification is the presence of systematic differences in allele frequencies between subpopulations within a sample, arising from distinct ancestry; genetic diversity here refers to how that variation is partitioned within and among human populations.

Scope

The entry covers measures of genetic differentiation among populations, the broad pattern of worldwide human diversity, and stratification as a confounder in association studies together with the methods used to detect and correct it. It is a methodological and descriptive topic and avoids any clinical or social interpretation of population categories.

Core questions

  • How is genetic differentiation between populations quantified?
  • How is human genetic variation partitioned within versus among populations?
  • How does population stratification bias genetic association studies?
  • How is stratification detected and corrected statistically?

Key concepts

  • F-statistics and FST
  • Within- vs among-population diversity
  • Isolation by distance
  • Principal components of ancestry
  • Confounding in association studies
  • Reference population panels

Key theories

F-statistics and the partition of diversity
Wright's hierarchical F-statistics, formalised for estimation by Nei and by Weir and Cockerham, partition genetic variance into within- and among-population components; FST summarises the proportion of total diversity attributable to differences among populations and is the standard measure of differentiation.

Mechanisms

Differentiation accumulates as drift, limited migration, and local selection make allele frequencies diverge between populations; the share of total diversity attributable to among-population differences is captured by FST. In humans, most genetic variation lies within populations, with a smaller but structured component among them that tracks geography. In association studies, if ancestry differs between cases and controls and also correlates with allele frequency, spurious associations arise; methods that summarise ancestry — notably principal-components analysis of genome-wide genotypes — are used to detect and adjust for this stratification.

Clinical relevance

Accounting for population structure is essential to the validity of genetic association studies that inform medical knowledge, because uncorrected stratification can generate false associations. Awareness of diversity also bears on the transferability of genomic findings across populations. This entry describes population structure as a methodological consideration and is not a basis for individual diagnostic or treatment decisions.

Evidence & guidelines

The estimation of differentiation rests on Nei's gene-diversity analysis and the Weir-Cockerham F-statistics, while genome-wide surveys of worldwide human variation and large reference panels describe the empirical structure of human diversity. Principal-components correction is a standard method for handling stratification in association studies.

History

Wright introduced F-statistics to describe structured populations, and Nei and then Weir and Cockerham provided practical estimators. Debates over how human variation is apportioned date to Lewontin's work in the 1970s; genome-wide genotyping later mapped worldwide human relationships in detail, and principal-components methods became standard for controlling stratification once large genotype datasets emerged.

Key figures

  • Sewall Wright
  • Masatoshi Nei
  • Bruce Weir
  • David Reich
  • Alkes Price

Related topics

Seminal works

  • nei-1973
  • weir-cockerham-1984
  • price-2006

Frequently asked questions

What does FST actually measure?
FST is the proportion of total genetic diversity that is due to allele-frequency differences among populations rather than variation within them; values near zero indicate little differentiation, and larger values indicate more.
Why is population stratification a problem in association studies?
If the proportion of different ancestries differs between cases and controls, any variant whose frequency also varies with ancestry can appear associated with the trait even when it has no causal role, so stratification must be detected and corrected.

Methods for this concept

Related concepts