ScholarGate
Assistent

Genome Sequencing and Assembly

Reading a genome means determining the order of its billions of bases, which sequencing machines do only in short pieces, leaving software to reconstruct the full sequence by finding where those pieces overlap.

Hitta ämne med PaperMindSnartFind papers & topics
Tools & resources
Ladda ner bildspel
Learn & explore
VideoSnart

Definition

Genome sequencing is the experimental determination of the nucleotide order of an organism's DNA, and assembly is the computational reconstruction of the complete sequence from the many short reads a sequencer produces.

Scope

This topic covers Sanger dideoxy sequencing, the principles of next-generation and long-read sequencing, the whole-genome shotgun and clone-based strategies, the computational assembly of reads into contigs and scaffolds, measures of assembly quality such as coverage and contiguity, and the resulting reference genomes. It treats how a genome's sequence is determined; the interpretation of that sequence is covered in the adjacent topics.

Core questions

  • How does Sanger sequencing determine the order of bases using chain terminators?
  • What makes next-generation and long-read sequencing faster and cheaper, and what are their trade-offs?
  • How are millions of overlapping reads assembled into chromosomes?
  • What do coverage and contiguity measures tell us about an assembly's quality?

Key concepts

  • Sanger dideoxy sequencing
  • Next-generation and long-read sequencing
  • Whole-genome shotgun strategy
  • Read assembly: contigs and scaffolds
  • Coverage, contiguity, and reference genomes

Mechanisms

Sanger sequencing uses chain-terminating dideoxynucleotides to generate a ladder of fragments whose lengths reveal the sequence; massively parallel platforms read millions of fragments at once, and assembly software detects overlaps among reads to merge them into contigs, ordering and orienting these into scaffolds along each chromosome.

Clinical relevance

Affordable sequencing has made whole-genome and exome sequencing routine for diagnosing rare inherited disease, profiling tumors, identifying pathogens, and screening newborns, transforming sequence determination from a landmark project into a standard laboratory test.

History

Sanger introduced chain-termination sequencing in 1977, the Human Genome Project applied clone-by-clone and shotgun strategies to produce a draft human sequence in 2001, and the arrival of next-generation sequencing in the mid-2000s, followed by long-read platforms, drove the cost of a human genome from billions of dollars toward a few hundred.

Key figures

  • Frederick Sanger
  • Eric Lander
  • Craig Venter

Related topics

Seminal works

  • sanger1977
  • lander2001

Frequently asked questions

Why must a genome be assembled rather than just read straight through?
Sequencing instruments can only read short stretches of DNA at a time, so a genome is broken into countless fragments; assembly software then reconstructs the original order by detecting where the fragments overlap.
What does sequencing coverage mean?
Coverage is the average number of times each base in the genome is read; higher coverage gives more confidence in each call and helps distinguish true variants from sequencing errors.

Methods for this concept

Related concepts