What does it mean to annotate a variant?

It means attaching biological context to the variant: where it sits relative to genes and regulatory elements, what molecular consequence it has, and how likely it is to affect function — so that important variants can be told apart from neutral ones.

Why are non-coding variants harder to annotate than coding ones?

Coding variants can be read against the genetic code to predict a protein change, but non-coding variants have no such direct readout; interpreting them relies on maps of regulatory elements and on links between variants and gene expression, which are still incomplete.

Functional Annotation of Genomic Variants

Sequencing a genome yields millions of variants, but most are of unknown consequence. Functional annotation is the process of attaching biological meaning to each variant — where it falls, what gene or regulatory element it affects, and how likely it is to alter function — so that the few variants that matter can be distinguished from the many that do not.

Troba un tema amb PaperMindAviatFind papers & topics

Tools & resources

Baixa les diapositives

Learn & explore

VídeoAviat

Definition

Functional annotation of genomic variants is the assignment of biological context and predicted functional consequence to sequence variants, including their genomic location, affected genes or regulatory elements, molecular effect (such as missense, nonsense, splice-altering, or regulatory), and a predicted impact on function.

Scope

This topic covers the annotation of single-nucleotide variants, insertions, deletions, and structural changes: locating variants relative to genes and regulatory regions, classifying their molecular consequence, and predicting deleteriousness for coding and non-coding sites. It treats annotation as a methodological and reference subject and does not provide variant interpretation for individual clinical cases.

Core questions

Where does a variant fall relative to genes, exons, splice sites, and regulatory elements?
What is its molecular consequence — does it change a protein, disrupt splicing, or affect regulation?
How likely is the variant to be deleterious to function?
How can non-coding variants, which lack a simple protein-changing readout, be interpreted?

Key concepts

Variant location and consequence classification
Missense, nonsense, frameshift, and splice variants
Deleteriousness prediction for coding variants
Non-coding and regulatory variant annotation
Reference annotation sources (gene models, conservation, functional element maps)
Expression quantitative trait loci (eQTLs)

Mechanisms

Annotation pipelines first map each variant onto a reference genome and a set of gene models to determine its position and basic consequence — whether it lies in a coding exon, a splice site, an untranslated region, or an intergenic region — using tools such as ANNOVAR and SnpEff. For coding variants that change an amino acid, prediction algorithms such as SIFT estimate whether the substitution is tolerated or damaging, drawing on sequence conservation across species. Non-coding variants are harder to interpret because they do not change a protein; here annotation relies on maps of functional elements such as those from ENCODE and on links between genetic variants and gene expression (eQTLs) catalogued by projects such as GTEx. The output is a layered description of each variant that supports downstream prioritisation.

Clinical relevance

Variant annotation is a foundational step in genomic research and in the analytical pipelines used to interpret sequencing data. It describes how candidate variants are characterised and prioritised; the predictions it produces are computational hypotheses and are not, on their own, a determination of pathogenicity or a basis for individual diagnostic or treatment decisions.

History

As high-throughput sequencing made whole-exome and whole-genome data routine in the late 2000s, the bottleneck shifted from generating variants to interpreting them. Conservation-based predictors such as SIFT (2009) addressed coding variants, while general annotation engines such as ANNOVAR (2010) and SnpEff (2012) systematised the assignment of consequence across variant types. Large functional-element catalogues such as ENCODE (2012) and expression resources such as GTEx (2015) then extended interpretation to the non-coding genome, which makes up the great majority of variation.

Debates

How should non-coding variants be interpreted?: Coding variants have a relatively interpretable molecular readout, but most variation is non-coding and lacks a direct protein consequence; interpreting it depends on functional-element maps and eQTL evidence whose completeness and tissue specificity remain limiting.

Key figures

Kai Wang
Pauline Ng
Steven Henikoff
Pablo Cingolani

Seminal works

kumar-2009
wang-2010
cingolani-2012
encode-2012

Frequently asked questions

What does it mean to annotate a variant?: It means attaching biological context to the variant: where it sits relative to genes and regulatory elements, what molecular consequence it has, and how likely it is to affect function — so that important variants can be told apart from neutral ones.
Why are non-coding variants harder to annotate than coding ones?: Coding variants can be read against the genetic code to predict a protein change, but non-coding variants have no such direct readout; interpreting them relies on maps of regulatory elements and on links between variants and gene expression, which are still incomplete.