ScholarGate
Assistent

Genome-Wide Association Studies and Variant Discovery

A genome-wide association study (GWAS) scans hundreds of thousands to millions of genetic variants across the genomes of many individuals to find positions where allele frequency differs systematically between people with and without a trait or disease. By testing the whole genome without a prior hypothesis about which gene is involved, GWAS turned the search for the genetic basis of common, complex conditions from a candidate-gene guessing game into a systematic, hypothesis-free discovery enterprise.

Finn tema med PaperMindSnartFind papers & topics
Tools & resources
Last ned lysbilder
Learn & explore
VideoSnart

Definition

A genome-wide association study is an observational genetic study that tests association between a phenotype and genetic variants - typically single-nucleotide polymorphisms - genotyped or imputed across the entire genome, declaring association at variants whose statistical evidence survives a genome-wide significance threshold.

Scope

This area orients the reader to the family of methods and concepts that surround variant discovery in unrelated populations: how a GWAS is designed and analysed, why linkage disequilibrium lets a sparse array tag untyped variants, why much trait heritability initially appeared 'missing', how ancestry differences can create spurious associations, and how rare-variant approaches extend discovery beyond common SNPs. It frames these as methodological reference topics within genomics, not as diagnostic or prescriptive clinical content.

Sub-topics

Core questions

  • How can the whole genome be tested for association with a trait without a prior candidate gene?
  • Why does genotyping a fraction of variants capture information about the rest?
  • What significance threshold controls false positives across millions of tests?
  • Why did early GWAS findings explain only a small share of estimated heritability?
  • How do differences in ancestry between cases and controls distort association signals?

Key concepts

  • Common disease, common variant hypothesis
  • Single-nucleotide polymorphism (SNP)
  • Linkage disequilibrium and tag SNPs
  • Genome-wide significance threshold (~5 x 10^-8)
  • Genotype imputation from reference panels
  • Polygenic architecture and effect sizes
  • Population stratification
  • Missing heritability

Mechanisms

A GWAS genotypes a dense panel of variants (or imputes them against a sequenced reference panel) and tests each variant for statistical association with the phenotype, usually via regression that adjusts for ancestry and other covariates. Because nearby variants are co-inherited in blocks of linkage disequilibrium, a typed marker can act as a proxy (tag) for untyped causal variants, so association at a marker localises a signal to a region rather than necessarily to the causal variant itself. The enormous number of tests requires a stringent genome-wide significance threshold to control false positives, and findings are confirmed by replication in independent samples. Most discovered variants are common, individually small in effect, and frequently in non-coding regulatory regions, consistent with a highly polygenic architecture for common traits.

Clinical relevance

GWAS have mapped thousands of robust variant-trait associations that inform understanding of disease biology, drug-target prioritisation, and the construction of polygenic scores. As a reference area, it explains how population-scale genetic evidence is generated and interpreted; it describes methods and findings and is not a basis for individual diagnosis, risk counselling, or treatment decisions.

Epidemiology

Since the first wave of studies around 2005-2007, GWAS have been applied to hundreds of diseases and quantitative traits in cohorts ranging from thousands to millions of participants, and curated repositories such as the NHGRI-EBI GWAS Catalog now record tens of thousands of associations. A persistent limitation is that the great majority of participants have been of European ancestry, which constrains the transferability of findings and polygenic scores to other populations.

Evidence & guidelines

Methodological standards for GWAS were consolidated through large consortium efforts and review syntheses rather than clinical practice guidelines. The Wellcome Trust Case Control Consortium study (2007) is a canonical demonstration of the shared-control, multi-disease design, and review articles by McCarthy et al. (2008) and Visscher et al. (2012, 2017) articulate consensus expectations on significance thresholds, quality control, replication, and interpretation.

History

The approach became feasible once dense SNP maps and the HapMap project characterised genome-wide linkage disequilibrium, and once affordable genotyping arrays appeared in the mid-2000s. The 2007 Wellcome Trust Case Control Consortium study, testing seven common diseases against shared controls, demonstrated the design at scale and catalysed a rapid expansion of association mapping. Subsequent reviews tracked the field's maturation from a handful of loci to genome-wide catalogues, and its reckoning with missing heritability, population diversity, and the move toward rare-variant and whole-genome sequencing studies.

Debates

How much of common-trait heritability can GWAS recover?
Early GWAS loci explained only a small fraction of estimated heritability, prompting debate over whether the gap reflects many undetected small-effect common variants, rare variants, structural variation, or overestimated heritability; later whole-genome methods narrowed but did not close the gap.
Does the European-ancestry bias of GWAS limit equity and validity?
Because most participants have been of European ancestry, discovered associations and polygenic scores transfer imperfectly to other populations, raising both scientific concerns about generalisability and equity concerns about who benefits from genomic medicine.

Key figures

  • Peter Visscher
  • Mark McCarthy
  • Joel Hirschhorn
  • Naomi Wray
  • Jian Yang

Related topics

Seminal works

  • wtccc-2007
  • mccarthy-2008
  • visscher-2012
  • visscher-2017

Frequently asked questions

What is the difference between a GWAS and a linkage study?
Linkage studies follow co-segregation of markers and disease within families and locate broad chromosomal regions, whereas a GWAS tests association across unrelated individuals at fine genome-wide resolution, making it better suited to common variants of small effect.
Why do GWAS use such a strict significance threshold?
Because millions of variants are tested, a conventional p-value of 0.05 would yield enormous numbers of false positives; a genome-wide threshold near 5 x 10^-8 accounts for the multiple testing implied by independent common variation across the genome.

Methods for this concept

Related concepts