ScholarGate
Asistent

String Matching

String matching finds all occurrences of a pattern within a text; classic algorithms preprocess the pattern to skip redundant comparisons and achieve linear or sublinear search time.

Pronađite temu uz PaperMindUskoroFind papers & topics
Tools & resources
Preuzmi slajdove
Learn & explore
VideoUskoro

Definition

String matching is the problem of finding all positions in a text where a given pattern string occurs; an exact string-matching algorithm reports every offset at which the pattern aligns identically with a substring of the text.

Scope

This topic covers exact single-pattern matching algorithms — the naive method, Knuth-Morris-Pratt, Boyer-Moore, and the hashing-based Rabin-Karp — together with the preprocessing (failure functions, shift tables, rolling hashes) that makes them efficient, and the extension to multiple-pattern matching via the Aho-Corasick automaton. Approximate matching and index-based search are covered in related topics.

Core questions

  • Why does naive matching take time proportional to the product of pattern and text lengths?
  • How does preprocessing the pattern avoid re-examining text characters?
  • How do the Knuth-Morris-Pratt failure function and Boyer-Moore shift rules work?
  • How does Rabin-Karp use hashing to test alignments quickly?
  • How is matching extended to many patterns simultaneously?

Key concepts

  • naive matching
  • Knuth-Morris-Pratt algorithm
  • failure function
  • Boyer-Moore algorithm
  • bad-character and good-suffix rules
  • Rabin-Karp algorithm
  • rolling hash
  • Aho-Corasick automaton

Key theories

Knuth-Morris-Pratt failure function
By precomputing, for each pattern prefix, the longest proper prefix that is also a suffix, the Knuth-Morris-Pratt algorithm shifts the pattern intelligently after a mismatch and never re-examines text characters, giving linear-time matching.
Boyer-Moore right-to-left scanning
Boyer-Moore compares the pattern from its last character backward and uses bad-character and good-suffix rules to skip large portions of the text, achieving sublinear average-case performance on typical inputs.

Clinical relevance

String matching underlies everyday tools and infrastructure: search in text editors and command-line utilities, content filtering and intrusion-detection signature scanning, plagiarism detection, and the location of motifs in biological sequences. Multiple-pattern matching scales these to dictionaries of many patterns at once.

History

The Knuth-Morris-Pratt algorithm (published 1977, developed earlier) gave the first widely known linear-time exact matcher, and Boyer and Moore (1977) introduced sublinear average-case matching the same year. Rabin and Karp's hashing approach and the Aho-Corasick multi-pattern automaton (1975) further broadened the field.

Key figures

  • Donald Knuth
  • James H. Morris
  • Vaughan Pratt
  • Robert Boyer
  • J Strother Moore
  • Michael Rabin
  • Richard Karp

Related topics

Seminal works

  • knuth1977
  • boyer1977
  • cormen2009

Frequently asked questions

Which exact string-matching algorithm is fastest?
It depends on the input. Knuth-Morris-Pratt guarantees linear worst-case time; Boyer-Moore is often fastest in practice on natural text because it skips characters and runs sublinearly on average; Rabin-Karp shines when searching for many patterns or doing fingerprint comparisons. There is no single universal winner.
How does Rabin-Karp avoid comparing every character at every position?
It hashes the pattern and each text window of the same length using a rolling hash that updates in constant time as the window slides. Only when hashes match does it verify the actual characters, so most non-matching positions are rejected by a single cheap comparison.

Methods for this concept

Related concepts