Why does a reference genome have different versions or builds?

As sequencing and assembly improve, the reference is revised to correct errors, close gaps, and better represent the species; each release is given a build version so that genomic coordinates remain stable and results stay comparable.

What is genome annotation?

Annotation is the process of marking where genes, transcripts, regulatory elements, and other features lie on the reference sequence, converting a string of nucleotides into a biologically interpretable map.

Reference Genome Standards and Annotation

A reference genome is a curated, representative consensus sequence for a species that serves as the shared coordinate system against which new sequence data are aligned and interpreted. Maintaining it as versioned builds, and layering biological annotation onto it, is what makes genomic results comparable across studies, laboratories, and time.

PaperMindでテーマを探す近日公開Find papers & topics

Tools & resources

スライドをダウンロード

Learn & explore

動画近日公開

Definition

A reference genome is a curated consensus nucleotide sequence chosen to represent a species' genome, maintained as versioned assemblies (builds) and annotated with the locations of genes and other functional elements, that provides a stable coordinate framework for aligning and interpreting genomic data.

Scope

The entry covers what a reference assembly is, how it is versioned into successive builds (such as the human GRCh38 assembly and the telomere-to-telomere assembly), the role of annotation in marking genes and functional features, and the move toward more complete and representative references. It is a reference and infrastructural topic, not clinical guidance.

Core questions

What is a reference genome and why does the field standardise on one?
How and why are reference assemblies versioned into successive builds?
What does genome annotation add to a reference sequence?

Key concepts

Reference assembly (consensus sequence)
Genome build and versioning (e.g., GRCh38)
Genome annotation
Coordinate system for alignment
Telomere-to-telomere (gapless) assembly
Assembly gaps and finishing

Mechanisms

A reference genome is assembled from high-quality sequence data into a consensus that represents the species rather than any single individual, then released as a versioned build so that genomic coordinates remain stable and citable. Annotation overlays the sequence with the positions of genes, transcripts, and regulatory and repetitive elements, turning raw coordinates into biologically interpretable maps. Successive builds incorporate corrections, fill gaps, and improve representation; the human reference progressed from the 2001 draft and 2004 finished euchromatic sequence to the GRCh38 build and ultimately to a complete telomere-to-telomere assembly that resolved previously inaccessible regions.

Clinical relevance

Because variant calling and interpretation are expressed in reference coordinates, the choice and version of the reference genome directly affect how genomic findings are reported and compared. This entry describes the reference infrastructure as educational material and is not a basis for individual clinical or diagnostic decisions.

Evidence & guidelines

The reference is documented through consortium primary reports and assembly evaluations rather than clinical guidelines: the initial draft (2001) and finished euchromatic sequence (2004), the evaluation of the GRCh38 build (Schneider et al., 2017), and the complete telomere-to-telomere human genome (Nurk et al., 2022) define the current standard and its trajectory.

History

The human reference genome began with the draft sequence of 2001 and the finished euchromatic sequence of 2004, then was maintained and improved by the Genome Reference Consortium across successive builds culminating in GRCh38. Persistent gaps in repetitive and centromeric regions were finally closed by the telomere-to-telomere consortium, which produced the first complete, gapless human genome in 2022 and reshaped what a reference standard can be.

Key figures

Deanna Church
Valerie Schneider
Adam Phillippy
Karen Miga

Seminal works

ihgsc-2004
schneider-2017
nurk-2022-ref

Frequently asked questions

Why does a reference genome have different versions or builds?: As sequencing and assembly improve, the reference is revised to correct errors, close gaps, and better represent the species; each release is given a build version so that genomic coordinates remain stable and results stay comparable.
What is genome annotation?: Annotation is the process of marking where genes, transcripts, regulatory elements, and other features lie on the reference sequence, converting a string of nucleotides into a biologically interpretable map.