ScholarGate
Asistents

Sequence Alignment Algorithms

Sequence alignment algorithms measure and display the similarity between two strings by finding the lowest-cost way to transform one into the other, using dynamic programming over insertions, deletions and substitutions.

Atrast tematu ar PaperMindDrīzumāFind papers & topics
Tools & resources
Lejupielādēt slaidus
Learn & explore
VideoDrīzumā

Definition

Sequence alignment is the arrangement of two (or more) strings to maximize their correspondence by inserting gaps and matching, mismatching, or substituting characters; an alignment algorithm computes an optimal alignment and its score under a given cost or scoring model, typically via dynamic programming.

Scope

This topic covers the dynamic-programming algorithms for comparing strings: edit (Levenshtein) distance and the longest common subsequence, global alignment (Needleman-Wunsch), local alignment (Smith-Waterman), and the scoring schemes (gap penalties, substitution matrices) that shape biological alignments. It treats the underlying recurrence and its quadratic time and space cost, plus space-saving refinements. Index-based and exact-matching methods are covered in related topics.

Core questions

  • How is similarity between two strings quantified as an edit distance or alignment score?
  • How does a dynamic-programming table compute an optimal alignment?
  • What distinguishes global alignment from local alignment, and when is each appropriate?
  • How do gap penalties and substitution matrices influence the resulting alignment?
  • How can the quadratic space requirement be reduced for long sequences?

Key concepts

  • edit (Levenshtein) distance
  • longest common subsequence
  • global alignment
  • local alignment
  • Needleman-Wunsch algorithm
  • Smith-Waterman algorithm
  • gap penalties
  • substitution matrices

Key theories

Global alignment by dynamic programming
The Needleman-Wunsch algorithm fills a dynamic-programming table whose entries give the best alignment score of string prefixes, yielding an optimal end-to-end (global) alignment of two sequences in time proportional to the product of their lengths.
Local alignment for shared subsequences
The Smith-Waterman algorithm modifies the recurrence to allow alignments to start and end anywhere, finding the highest-scoring matching region (local alignment), which is essential for detecting conserved fragments within otherwise dissimilar sequences.

Clinical relevance

Sequence alignment is a cornerstone of computational biology, used to compare DNA, RNA and protein sequences for homology, evolutionary inference, and database search (the heuristic BLAST builds on local alignment). Outside biology, the same edit-distance machinery powers spell checking, fuzzy text matching, version-control diffs, and speech and handwriting comparison.

History

Levenshtein introduced edit distance in 1965. Needleman and Wunsch gave the first dynamic-programming global-alignment algorithm for proteins in 1970, and Smith and Waterman introduced local alignment in 1981. Hirschberg's linear-space technique (1975) and later heuristics such as BLAST extended alignment to large-scale practical use.

Key figures

  • Saul Needleman
  • Christian Wunsch
  • Temple Smith
  • Michael Waterman
  • Vladimir Levenshtein

Related topics

Seminal works

  • needleman1970
  • smith1981
  • gusfield1997

Frequently asked questions

What is the difference between global and local alignment?
Global alignment (Needleman-Wunsch) aligns two sequences end to end and is appropriate when they are of similar length and expected to be related throughout. Local alignment (Smith-Waterman) finds the best-matching subregion and is appropriate when only parts of the sequences are similar, such as a conserved domain within larger sequences.
Is sequence alignment the same as the longest common subsequence?
They are closely related. The longest common subsequence is a special case of alignment with particular scoring (matches rewarded, gaps free, mismatches disallowed). General alignment uses richer scoring with substitution costs and gap penalties, but both are solved by the same kind of dynamic-programming recurrence.

Methods for this concept

Related concepts