Functional Annotation of Genomic Variants
Sequencing a genome yields millions of variants, but most are of unknown consequence. Functional annotation is the process of attaching biological meaning to each variant — where it falls, what gene or regulatory element it affects, and how likely it is to alter function — so that the few variants that matter can be distinguished from the many that do not.
Definition
Functional annotation of genomic variants is the assignment of biological context and predicted functional consequence to sequence variants, including their genomic location, affected genes or regulatory elements, molecular effect (such as missense, nonsense, splice-altering, or regulatory), and a predicted impact on function.
Scope
This topic covers the annotation of single-nucleotide variants, insertions, deletions, and structural changes: locating variants relative to genes and regulatory regions, classifying their molecular consequence, and predicting deleteriousness for coding and non-coding sites. It treats annotation as a methodological and reference subject and does not provide variant interpretation for individual clinical cases.
Core questions
- Where does a variant fall relative to genes, exons, splice sites, and regulatory elements?
- What is its molecular consequence — does it change a protein, disrupt splicing, or affect regulation?
- How likely is the variant to be deleterious to function?
- How can non-coding variants, which lack a simple protein-changing readout, be interpreted?
Key concepts
- Variant location and consequence classification
- Missense, nonsense, frameshift, and splice variants
- Deleteriousness prediction for coding variants
- Non-coding and regulatory variant annotation
- Reference annotation sources (gene models, conservation, functional element maps)
- Expression quantitative trait loci (eQTLs)
Mechanisms
Annotation pipelines first map each variant onto a reference genome and a set of gene models to determine its position and basic consequence — whether it lies in a coding exon, a splice site, an untranslated region, or an intergenic region — using tools such as ANNOVAR and SnpEff. For coding variants that change an amino acid, prediction algorithms such as SIFT estimate whether the substitution is tolerated or damaging, drawing on sequence conservation across species. Non-coding variants are harder to interpret because they do not change a protein; here annotation relies on maps of functional elements such as those from ENCODE and on links between genetic variants and gene expression (eQTLs) catalogued by projects such as GTEx. The output is a layered description of each variant that supports downstream prioritisation.
Clinical relevance
Variant annotation is a foundational step in genomic research and in the analytical pipelines used to interpret sequencing data. It describes how candidate variants are characterised and prioritised; the predictions it produces are computational hypotheses and are not, on their own, a determination of pathogenicity or a basis for individual diagnostic or treatment decisions.
History
As high-throughput sequencing made whole-exome and whole-genome data routine in the late 2000s, the bottleneck shifted from generating variants to interpreting them. Conservation-based predictors such as SIFT (2009) addressed coding variants, while general annotation engines such as ANNOVAR (2010) and SnpEff (2012) systematised the assignment of consequence across variant types. Large functional-element catalogues such as ENCODE (2012) and expression resources such as GTEx (2015) then extended interpretation to the non-coding genome, which makes up the great majority of variation.
Debates
- How should non-coding variants be interpreted?
- Coding variants have a relatively interpretable molecular readout, but most variation is non-coding and lacks a direct protein consequence; interpreting it depends on functional-element maps and eQTL evidence whose completeness and tissue specificity remain limiting.
Key figures
- Kai Wang
- Pauline Ng
- Steven Henikoff
- Pablo Cingolani
Related topics
Seminal works
- kumar-2009
- wang-2010
- cingolani-2012
- encode-2012
Frequently asked questions
- What does it mean to annotate a variant?
- It means attaching biological context to the variant: where it sits relative to genes and regulatory elements, what molecular consequence it has, and how likely it is to affect function — so that important variants can be told apart from neutral ones.
- Why are non-coding variants harder to annotate than coding ones?
- Coding variants can be read against the genetic code to predict a protein change, but non-coding variants have no such direct readout; interpreting them relies on maps of regulatory elements and on links between variants and gene expression, which are still incomplete.