Evolutionary Conservation and Constraint Metrics
Sequence that has changed little across species, or that carries fewer variants than expected within a species, is said to be conserved or constrained — a signature that purifying selection has removed deleterious changes. Conservation and constraint metrics turn this evolutionary signal into quantitative scores that flag functionally important positions in the genome.
Definition
Evolutionary conservation is the persistence of a sequence across species through purifying selection; constraint is the corresponding depletion of variation within a species. Conservation and constraint metrics are scores that quantify, position by position or gene by gene, how strongly selection has acted against change.
Scope
The entry covers cross-species conservation scores and within-species constraint metrics, the evolutionary logic that connects them to function, and their role in prioritising variants. It is a methodological topic describing how scores are derived and interpreted, not a tool for assigning clinical significance to a particular variant.
Core questions
- Why does conservation across species indicate biological function?
- How are per-base conservation scores computed from multiple alignments?
- How is constraint measured from the deficit of variation within a species?
- How are these metrics used to prioritise candidate variants?
Key concepts
- Purifying (negative) selection
- Cross-species conservation scores
- phastCons and phyloP
- Within-species constraint
- Loss-of-function intolerance
- Observed vs expected variant counts
Key theories
- Purifying selection and reduced substitution rate
- Functionally important sites experience purifying (negative) selection that removes deleterious mutations, lowering their substitution rate between species and depleting segregating variation within a species; conservation and constraint scores read this reduced rate or depletion as evidence of function.
Mechanisms
Conservation is inferred by comparing homologous sequences across species in a multiple alignment and asking where substitutions are rarer than a neutral rate would predict; methods such as phastCons and phyloP formalise this on a phylogeny. Constraint instead uses within-species data, comparing the number of variants actually observed in a gene or region with the number expected under a neutral mutational model — a substantial deficit indicates intolerance to variation, especially to loss-of-function changes. Large human sequencing datasets have made it possible to quantify this constraint genome-wide.
Clinical relevance
Conservation and constraint metrics are widely used as supporting evidence when prioritising candidate variants in research and diagnostic genomics, because variants at constrained positions or in constrained genes are, on average, more likely to be functionally consequential. These scores are population- and evolution-level summaries; this entry describes how they are derived and is not a basis for individual diagnostic or treatment decisions.
Evidence & guidelines
Cross-species conservation scoring was established by phylogenetic methods such as phastCons and phyloP, while large-scale human sequencing efforts quantified within-species constraint and loss-of-function intolerance across genes, providing the empirical basis for constraint metrics now used in variant prioritisation.
History
Comparative sequence analysis long suggested that conserved regions were functional, and the availability of many aligned genomes in the 2000s allowed quantitative conservation scores to be computed genome-wide. With very large human cohorts in the 2010s, attention extended to within-species constraint, yielding gene-level intolerance metrics and a genome-wide map of mutational constraint.
Key figures
- Adam Siepel
- David Haussler
- Katherine Pollard
- Daniel MacArthur
Related topics
Seminal works
- siepel-2005
- lek-2016
- karczewski-2020
Frequently asked questions
- What is the difference between conservation and constraint?
- Conservation is measured across species, from how little a sequence has changed over evolutionary time; constraint is measured within a species, from how much variation is missing relative to neutral expectation. Both reflect purifying selection but use different data.
- Does a high constraint score mean a variant is pathogenic?
- No. Constraint and conservation are statistical signals of functional importance averaged over positions or genes; they can support variant prioritisation but do not by themselves establish that any individual variant causes disease.