Which exact string-matching algorithm is fastest?

It depends on the input. Knuth-Morris-Pratt guarantees linear worst-case time; Boyer-Moore is often fastest in practice on natural text because it skips characters and runs sublinearly on average; Rabin-Karp shines when searching for many patterns or doing fingerprint comparisons. There is no single universal winner.

How does Rabin-Karp avoid comparing every character at every position?

It hashes the pattern and each text window of the same length using a rolling hash that updates in constant time as the window slides. Only when hashes match does it verify the actual characters, so most non-matching positions are rejected by a single cheap comparison.

String Matching

String matching finds all occurrences of a pattern within a text; classic algorithms preprocess the pattern to skip redundant comparisons and achieve linear or sublinear search time.

Znajdź temat z PaperMindWkrótceFind papers & topics

Tools & resources

Pobierz slajdy

Learn & explore

WideoWkrótce

Definition

String matching is the problem of finding all positions in a text where a given pattern string occurs; an exact string-matching algorithm reports every offset at which the pattern aligns identically with a substring of the text.

Scope

This topic covers exact single-pattern matching algorithms — the naive method, Knuth-Morris-Pratt, Boyer-Moore, and the hashing-based Rabin-Karp — together with the preprocessing (failure functions, shift tables, rolling hashes) that makes them efficient, and the extension to multiple-pattern matching via the Aho-Corasick automaton. Approximate matching and index-based search are covered in related topics.

Core questions

Why does naive matching take time proportional to the product of pattern and text lengths?
How does preprocessing the pattern avoid re-examining text characters?
How do the Knuth-Morris-Pratt failure function and Boyer-Moore shift rules work?
How does Rabin-Karp use hashing to test alignments quickly?
How is matching extended to many patterns simultaneously?

Key concepts

naive matching
Knuth-Morris-Pratt algorithm
failure function
Boyer-Moore algorithm
bad-character and good-suffix rules
Rabin-Karp algorithm
rolling hash
Aho-Corasick automaton

Key theories

Knuth-Morris-Pratt failure function: By precomputing, for each pattern prefix, the longest proper prefix that is also a suffix, the Knuth-Morris-Pratt algorithm shifts the pattern intelligently after a mismatch and never re-examines text characters, giving linear-time matching.
Boyer-Moore right-to-left scanning: Boyer-Moore compares the pattern from its last character backward and uses bad-character and good-suffix rules to skip large portions of the text, achieving sublinear average-case performance on typical inputs.

Clinical relevance

String matching underlies everyday tools and infrastructure: search in text editors and command-line utilities, content filtering and intrusion-detection signature scanning, plagiarism detection, and the location of motifs in biological sequences. Multiple-pattern matching scales these to dictionaries of many patterns at once.

History

The Knuth-Morris-Pratt algorithm (published 1977, developed earlier) gave the first widely known linear-time exact matcher, and Boyer and Moore (1977) introduced sublinear average-case matching the same year. Rabin and Karp's hashing approach and the Aho-Corasick multi-pattern automaton (1975) further broadened the field.

Key figures

Donald Knuth
James H. Morris
Vaughan Pratt
Robert Boyer
J Strother Moore
Michael Rabin
Richard Karp

Seminal works

knuth1977
boyer1977
cormen2009

Frequently asked questions

Which exact string-matching algorithm is fastest?: It depends on the input. Knuth-Morris-Pratt guarantees linear worst-case time; Boyer-Moore is often fastest in practice on natural text because it skips characters and runs sublinearly on average; Rabin-Karp shines when searching for many patterns or doing fingerprint comparisons. There is no single universal winner.
How does Rabin-Karp avoid comparing every character at every position?: It hashes the pattern and each text window of the same length using a rolling hash that updates in constant time as the window slides. Only when hashes match does it verify the actual characters, so most non-matching positions are rejected by a single cheap comparison.