ScholarGate
アシスタント

Reference Genome Standards and Annotation

A reference genome is a curated, representative consensus sequence for a species that serves as the shared coordinate system against which new sequence data are aligned and interpreted. Maintaining it as versioned builds, and layering biological annotation onto it, is what makes genomic results comparable across studies, laboratories, and time.

PaperMindでテーマを探す近日公開Find papers & topics
Tools & resources
スライドをダウンロード
Learn & explore
動画近日公開

Definition

A reference genome is a curated consensus nucleotide sequence chosen to represent a species' genome, maintained as versioned assemblies (builds) and annotated with the locations of genes and other functional elements, that provides a stable coordinate framework for aligning and interpreting genomic data.

Scope

The entry covers what a reference assembly is, how it is versioned into successive builds (such as the human GRCh38 assembly and the telomere-to-telomere assembly), the role of annotation in marking genes and functional features, and the move toward more complete and representative references. It is a reference and infrastructural topic, not clinical guidance.

Core questions

  • What is a reference genome and why does the field standardise on one?
  • How and why are reference assemblies versioned into successive builds?
  • What does genome annotation add to a reference sequence?

Key concepts

  • Reference assembly (consensus sequence)
  • Genome build and versioning (e.g., GRCh38)
  • Genome annotation
  • Coordinate system for alignment
  • Telomere-to-telomere (gapless) assembly
  • Assembly gaps and finishing

Mechanisms

A reference genome is assembled from high-quality sequence data into a consensus that represents the species rather than any single individual, then released as a versioned build so that genomic coordinates remain stable and citable. Annotation overlays the sequence with the positions of genes, transcripts, and regulatory and repetitive elements, turning raw coordinates into biologically interpretable maps. Successive builds incorporate corrections, fill gaps, and improve representation; the human reference progressed from the 2001 draft and 2004 finished euchromatic sequence to the GRCh38 build and ultimately to a complete telomere-to-telomere assembly that resolved previously inaccessible regions.

Clinical relevance

Because variant calling and interpretation are expressed in reference coordinates, the choice and version of the reference genome directly affect how genomic findings are reported and compared. This entry describes the reference infrastructure as educational material and is not a basis for individual clinical or diagnostic decisions.

Evidence & guidelines

The reference is documented through consortium primary reports and assembly evaluations rather than clinical guidelines: the initial draft (2001) and finished euchromatic sequence (2004), the evaluation of the GRCh38 build (Schneider et al., 2017), and the complete telomere-to-telomere human genome (Nurk et al., 2022) define the current standard and its trajectory.

History

The human reference genome began with the draft sequence of 2001 and the finished euchromatic sequence of 2004, then was maintained and improved by the Genome Reference Consortium across successive builds culminating in GRCh38. Persistent gaps in repetitive and centromeric regions were finally closed by the telomere-to-telomere consortium, which produced the first complete, gapless human genome in 2022 and reshaped what a reference standard can be.

Key figures

  • Deanna Church
  • Valerie Schneider
  • Adam Phillippy
  • Karen Miga

Related topics

Seminal works

  • ihgsc-2004
  • schneider-2017
  • nurk-2022-ref

Frequently asked questions

Why does a reference genome have different versions or builds?
As sequencing and assembly improve, the reference is revised to correct errors, close gaps, and better represent the species; each release is given a build version so that genomic coordinates remain stable and results stay comparable.
What is genome annotation?
Annotation is the process of marking where genes, transcripts, regulatory elements, and other features lie on the reference sequence, converting a string of nucleotides into a biologically interpretable map.

Methods for this concept

Related concepts