Cryptographic Hash Functions
A cryptographic hash function compresses arbitrary input into a short fixed-length digest in a way that is one-way and collision-resistant, serving as a basic building block for integrity, authentication, and many higher-level protocols.
Definition
A cryptographic hash function is a deterministic function mapping inputs of arbitrary length to a fixed-length output (digest) such that it is infeasible to find a preimage, a second preimage, or any two distinct inputs producing the same output.
Scope
This topic covers the security properties of hash functions (preimage, second-preimage, and collision resistance), their internal constructions (Merkle-Damgaard, sponge), the standardized families SHA-2 and SHA-3, and applications such as integrity verification, password storage, commitment, and proof-of-work. It addresses cryptanalytic attacks including the birthday bound and the breaks of MD5 and SHA-1. It excludes keyed authentication codes, which are treated under message authentication codes, though many MACs are built from hash functions.
Core questions
- What distinguishes a cryptographic hash function from an ordinary checksum or hash table function?
- Why are preimage, second-preimage, and collision resistance distinct, and how do they relate?
- Why does the birthday paradox bound collision resistance at half the digest length?
- How do Merkle-Damgaard and sponge constructions extend a fixed-size compression to arbitrary input?
- How are hash functions used to commit to data, verify integrity, and store passwords?
Key concepts
- message digest
- preimage resistance
- second-preimage resistance
- collision resistance
- birthday attack
- Merkle-Damgaard construction
- sponge construction (Keccak)
- length-extension attack
- proof-of-work
Key theories
- Collision resistance and the birthday bound
- Because of the birthday paradox, a generic collision-finding attack on an n-bit hash succeeds in about 2^(n/2) work; collision resistance therefore requires digests roughly twice the desired security level (e.g., 256 bits for 128-bit security).
- Iterated and sponge constructions
- The Merkle-Damgaard paradigm builds a full hash by iterating a fixed-input compression function over message blocks; the sponge construction used by SHA-3 (Keccak) absorbs input into and squeezes output from a large permutation state, avoiding length-extension weaknesses.
Mechanisms
A Merkle-Damgaard hash pads the message, splits it into blocks, and iteratively applies a compression function that mixes each block into a running chaining value, outputting the final value as the digest. SHA-3's sponge construction instead absorbs message blocks by XORing them into part of a large state permuted by the Keccak-f function, then squeezes out digest bits, which avoids the length-extension property inherent to plain Merkle-Damgaard.
Clinical relevance
Hash functions are pervasive: they verify software-download integrity, anchor digital signatures (which sign a hash, not the whole document), index content in Git and content-delivery networks, secure password storage via slow key-derivation functions (bcrypt, Argon2), and provide the proof-of-work and Merkle trees underlying blockchains. The practical break of MD5 and SHA-1 collision resistance forced migration to SHA-2 across the internet.
Evidence & guidelines
SHA-2 (SHA-256, SHA-512) is specified in FIPS 180-4 and SHA-3 in FIPS 202; both are NIST-approved. MD5 and SHA-1 are deprecated for security use after demonstrated collision attacks (the 2017 SHAttered SHA-1 collision being a landmark). For password storage, standards recommend memory-hard functions such as Argon2 rather than plain hashing.
History
The MD family (Rivest's MD4 and MD5) and the NSA-designed SHA-0/SHA-1 dominated the 1990s. Xiaoyun Wang's collision attacks (2004-2005) broke MD5 and weakened SHA-1, prompting NIST's open SHA-3 competition (2007-2012), won by the Keccak sponge design of Bertoni, Daemen, Peeters, and Van Assche. SHA-1 was practically broken by the SHAttered collision in 2017.
Key figures
- Ralph Merkle
- Ivan Damgaard
- Guido Bertoni
- Joan Daemen
- Xiaoyun Wang
Related topics
Seminal works
- nist2015sha3
- katz2020
- menezes1996
Frequently asked questions
- Why can't I just hash a password and store the digest?
- Plain cryptographic hashes are fast, which lets attackers try billions of guesses per second against a stolen database. Password storage instead uses deliberately slow, salted, memory-hard functions (bcrypt, scrypt, Argon2) so that brute-forcing each password is expensive.
- Is SHA-256 broken now that SHA-1 is?
- No. SHA-1 and SHA-256 are different algorithms; the collision attacks on SHA-1 do not transfer to SHA-2. SHA-256 remains unbroken and is the recommended general-purpose hash, alongside SHA-3.