ScholarGate
Assistent

Allele Frequency Spectrum and Site Frequency Spectrum

The allele (or site) frequency spectrum is the distribution of variant frequencies in a sample: it counts how many variable sites have an allele present in one copy, two copies, and so on up through the sample. Because demographic history and natural selection each leave a distinctive imprint on this distribution, the spectrum is one of the most informative summaries of population genetic data.

Find emne med PaperMindSnartFind papers & topics
Tools & resources
Hent slides
Learn & explore
VideoSnart

Definition

The site frequency spectrum is the distribution, across all variable sites in a sample, of the count of the derived (or minor) allele; equivalently it tabulates how many sites have each possible allele frequency in the sample.

Scope

The entry covers the definition of the folded and unfolded frequency spectrum, its expectation under neutral models, and its use as a target for demographic and selection inference. It is a methodological topic and does not provide clinical interpretation of any specific variant frequency.

Core questions

  • How is the frequency spectrum defined for a sample of sequences?
  • What does the spectrum look like under a neutral, constant-size population?
  • How do population growth, bottlenecks, and selection distort the spectrum?
  • How is the spectrum used to fit demographic models?

Key concepts

  • Folded vs unfolded spectrum
  • Derived and ancestral alleles
  • Excess of rare variants
  • Tajima's D
  • Demographic inference
  • Effect of population growth on rare variation

Key theories

Coalescent expectation of the frequency spectrum
Under the standard neutral coalescent the expected number of sites with a derived allele present i times is proportional to 1/i, giving a characteristic excess of rare variants; deviations from this expectation, summarised by statistics such as Tajima's D, are used to detect demographic change or selection.

Mechanisms

Each segregating site contributes its sample allele count to the spectrum. Under the neutral coalescent the expected spectrum is heavily weighted toward rare variants, and demographic events reshape it predictably: recent population growth inflates the count of very rare variants, while bottlenecks deplete them. Selection also shifts the spectrum locally. Inference methods fit demographic or selection models by matching the observed spectrum to its expectation, and deep sequencing of large samples has shown an abundance of rare coding variants consistent with recent, rapid human population growth.

Clinical relevance

The frequency spectrum underlies the allele-frequency thresholds used when filtering variants in clinical genomics, because the rarity of a variant in reference populations is part of how its potential significance is weighed. This entry explains the population-level distribution of frequencies and is not a basis for individual diagnostic or treatment decisions.

Evidence & guidelines

The neutral expectation of the spectrum traces to coalescent theory and to Tajima's test statistic, while empirical studies of human exomes and demographic-inference frameworks show how the observed spectrum is used to reconstruct population history and to characterise the rare-variant burden.

History

The frequency spectrum emerged from diffusion and coalescent theory as a compact summary of polymorphism data. With genome- and exome-scale sequencing it became a practical inference target, and analyses of large human samples in the early 2010s used the spectrum to demonstrate recent explosive growth and the resulting excess of rare variants.

Key figures

  • Fumio Tajima
  • Simon Gravel
  • Laurent Excoffier
  • Carlos Bustamante

Related topics

Seminal works

  • tajima-1989
  • gravel-2011
  • tennessen-2012

Frequently asked questions

What is the difference between the folded and unfolded spectrum?
The unfolded spectrum distinguishes the derived allele from the ancestral one, which requires knowing which allele is ancestral; the folded spectrum, used when ancestral state is unknown, instead tallies the minor allele count.
Why does the frequency spectrum show so many rare variants?
Even under a simple neutral model rare variants are expected to predominate, and recent human population growth has amplified this further, producing a large excess of very rare variants in large samples.

Methods for this concept

Related concepts