What is the difference between sequencing and assembly?

Sequencing reads the order of nucleotides in DNA fragments, while assembly is the computational step that reconstructs those fragments into longer, contiguous sequences such as contigs, scaffolds, or whole chromosomes.

Why does the field need a reference genome?

A reference genome provides a shared, versioned coordinate system so that new sequence data from different individuals and laboratories can be aligned, compared, and interpreted consistently.

Genome Sequencing, Assembly, and Reference Standards

This area covers how the order of nucleotides in a genome is read, how the resulting fragments are reconstructed into longer contiguous sequences, and how curated reference genomes are built and maintained so that new data can be aligned and interpreted against a shared standard. Together these steps form the technical foundation on which nearly all of genomics rests.

Pronađite temu uz PaperMindUskoroFind papers & topics

Tools & resources

Preuzmi prezentaciju

Learn & explore

VideoUskoro

Definition

Genome sequencing is the determination of the nucleotide order of an organism's DNA; assembly is the computational reconstruction of overlapping sequence reads into longer contiguous sequences; and reference standards are the curated, versioned genome assemblies and annotations against which new sequence data are aligned and compared.

Scope

The area spans sequencing chemistries from Sanger dideoxy sequencing through high-throughput short-read and long-read platforms, the computational assembly of reads into contigs and scaffolds, the construction and annotation of reference genomes such as GRCh38 and the telomere-to-telomere assembly, and the quality-control and error-correction steps that govern data reliability. It treats these as methodological and infrastructural topics, not as clinical procedures.

Sub-topics

Core questions

How is the nucleotide order of a genome determined, and how have sequencing chemistries evolved?
How are short or long sequence reads reconstructed into a complete genome?
What makes a genome assembly a usable reference, and how is it versioned and annotated?
How are sequencing errors detected, quantified, and corrected so that downstream analyses are trustworthy?

Key concepts

Read, contig, and scaffold
Coverage and sequencing depth
Short-read versus long-read sequencing
De novo assembly versus reference-guided alignment
Reference genome and genome build (e.g., GRCh38)
Genome annotation
Per-base quality (Phred) score

Mechanisms

Sequencing platforms convert physical DNA into machine-readable base calls, each accompanied by a quality estimate. Because most platforms read only fragments far shorter than a chromosome, the fragments must be assembled: de novo assembly reconstructs the genome from read overlaps (historically overlap-layout-consensus, now often de Bruijn graphs for short reads), while reference-guided analysis aligns reads to an existing assembly. A reference genome is a curated consensus sequence, versioned as successive builds and layered with annotation, that provides the coordinate system for the field. Quality control and error correction sit across the whole pipeline, estimating per-base accuracy and removing or correcting artefacts before variants are called.

Clinical relevance

Reliable sequencing, assembly, and reference standards underpin clinical and research genomics, since variant interpretation depends on accurate reads aligned to a well-characterised reference. This area describes the infrastructure that generates genomic evidence; it is reference and educational material and not a basis for individual diagnostic or treatment decisions.

Evidence & guidelines

The methods here are documented through landmark primary studies and consortium reports rather than clinical guidelines: Sanger's chain-termination method (1977), the Human Genome Project's draft (2001), reviews of next-generation platforms (Metzker, 2010), and the complete telomere-to-telomere human genome (Nurk et al., 2022) trace the field's trajectory.

History

DNA sequencing began with Sanger's chain-termination chemistry in 1977, which enabled the first genomes to be read and powered the Human Genome Project's draft sequence in 2001. The subsequent rise of high-throughput (next-generation) platforms drove costs down by orders of magnitude, and long-read technologies later resolved repetitive regions, culminating in the first complete, gapless human genome in 2022.

Key figures

Frederick Sanger
Eric Lander
Michael Metzker
Sergey Koren
Adam Phillippy

Seminal works

sanger-1977
ihgsc-2001
metzker-2009
nurk-2022

Frequently asked questions

What is the difference between sequencing and assembly?: Sequencing reads the order of nucleotides in DNA fragments, while assembly is the computational step that reconstructs those fragments into longer, contiguous sequences such as contigs, scaffolds, or whole chromosomes.
Why does the field need a reference genome?: A reference genome provides a shared, versioned coordinate system so that new sequence data from different individuals and laboratories can be aligned, compared, and interpreted consistently.