Why does it matter that most genomic research is in European-ancestry populations?

Pharmacogenomic discovery, variant interpretation, and predictive tools are calibrated to the populations studied. When those are mostly European-ancestry, the resulting knowledge is less complete and less reliable for other populations, contributing to health disparities.

Does adding diverse data fix the problem?

Expanding the diversity of cohorts and reference panels is necessary and helps, but it must be paired with analytic methods that transfer across populations and with attention to community trust and equitable benefit.

Representation Bias in Pharmacogenomics Research

Representation bias is the systematic overrepresentation of some populations - overwhelmingly people of European ancestry - and underrepresentation of others in the cohorts, reference databases, and validation studies that underpin pharmacogenomics. Because pharmacogenetic discovery and annotation are calibrated to the populations studied, this imbalance leaves the evidence base less complete and less reliable for underrepresented groups.

Εύρεση θέματος με το PaperMindΣύντομαFind papers & topics

Tools & resources

Λήψη διαφανειών

Learn & explore

ΒίντεοΣύντομα

Definition

The systematic skew in pharmacogenomic evidence whereby discovery cohorts, reference panels, and validation studies disproportionately sample particular populations - chiefly European-ancestry - producing findings and tools that generalize unevenly across humanity.

Scope

This topic documents the scale of the diversity gap in genomic and pharmacogenomic research, the mechanisms by which it biases findings, and its downstream consequences for equity. It is a methodological and reference overview; it offers no clinical recommendations.

Core questions

How unevenly are populations represented in genomic and pharmacogenomic research?
Through what mechanisms does underrepresentation bias discovery, annotation, and validation?
What are the equity consequences of the diversity gap?
Which initiatives are expanding the diversity of genomic data?
How does representation bias interact with the transferability of genetic predictors?

Key concepts

Overrepresentation and underrepresentation
Ascertainment bias
Reference panel composition
Transferability (portability) of predictors
Variant interpretation and reclassification
Diverse biobanks and consortia (e.g., TOPMed, H3Africa, All of Us)
Health disparities

Mechanisms

Bias enters at several stages. Discovery cohorts drawn mainly from European-ancestry participants identify the variants common in that group, so variants prevalent elsewhere are less likely to be found or functionally annotated. Reference panels assembled from skewed data impute and interpret variation better for well-represented populations. Validation studies conducted in the same groups confirm performance there but leave generalizability untested elsewhere. As a consequence, genetic predictors - including polygenic scores - tend to transfer poorly to underrepresented populations, and variants in those populations are more often classified as of uncertain significance. African populations, which harbor the most genetic diversity, are especially affected because much of their variation is simply absent from European-centric resources.

Clinical relevance

Representation bias is central to judging whether a pharmacogenomic finding or tool is trustworthy for a given population. This entry describes how the bias arises and what it implies for the completeness of evidence; it is not clinical guidance and does not address testing or treatment for individuals.

Epidemiology

Audits of genome-wide association and genomic studies have repeatedly found that participants of European ancestry constitute a large majority of those studied - far above their share of the global population - while African, Latin American, and many Asian and Indigenous populations remain markedly underrepresented, a gap that has narrowed only slowly despite sustained attention.

History

Concern about the lack of diversity in genomics was crystallized by Popejoy and Fullerton's 2016 analysis showing the dominance of European-ancestry participants, and reinforced by Sirugo and colleagues' 2019 review documenting the persistence of the gap. Martin and colleagues (2019) demonstrated a concrete harm - poorer and potentially disparity-widening performance of polygenic scores in non-European populations. Initiatives such as the NHLBI TOPMed program, H3Africa, and large diverse biobanks emerged in part to address the imbalance.

Debates

Why has the diversity gap persisted despite attention?: Explanations include entrenched recruitment infrastructure, funding patterns, historical mistrust among marginalized communities, and methodological convenience; commentators disagree on which levers most effectively close the gap.

Key figures

Alice B. Popejoy
Stephanie M. Fullerton
Sarah Tishkoff
Alicia R. Martin
Charles Rotimi

Seminal works

popejoy-2016
sirugo-2019
martin-2019

Frequently asked questions

Why does it matter that most genomic research is in European-ancestry populations?: Pharmacogenomic discovery, variant interpretation, and predictive tools are calibrated to the populations studied. When those are mostly European-ancestry, the resulting knowledge is less complete and less reliable for other populations, contributing to health disparities.
Does adding diverse data fix the problem?: Expanding the diversity of cohorts and reference panels is necessary and helps, but it must be paired with analytic methods that transfer across populations and with attention to community trust and equitable benefit.