Process / pipelineAudio Signal Processing

MFCC (Mel-Frequency Cepstral Coefficients)

Mel-Frequency Cepstral Coefficients · Also known as: mel-cepstral features, MFCC features, mel-frequency features

Mel-Frequency Cepstral Coefficients (MFCCs) are a compact representation of audio features that mimic human auditory perception. Introduced by Davis and Mermelstein in 1980, MFCCs are the de facto feature extraction method for speech recognition and environmental sound analysis. They compress the frequency information of audio signals into a small set of coefficients that capture phonetic content while discarding irrelevant details.

Tools & resources

Download slides

Learn & explore

Read the full method

Members only

Method map

The neighbourhood of related methods — select a node to explore.

MFCC

Ambisonics Head-Related Transfer Fu…Independent Vector Analy…

When to use it

Use MFCCs for speech recognition, speaker identification, emotion detection, and environmental sound classification. They are industry-standard for automatic speech recognition (ASR) systems and deep learning models. Apply to audio signals with clear speech or environmental structure. Avoid for music analysis without modification (music requires different features like spectral flux or chroma).

Strengths & limitations

Strengths

Compact representation (~13 coefficients per frame) reduces computational cost and storage
Aligned with human auditory perception; captures perceptually important information
Robust to speaker variability and moderate noise
Widely implemented in speech recognition frameworks (HTK, Kaldi, TensorFlow)

Limitations

Discards phase information; cannot reconstruct audio from MFCCs alone
Assumes stationarity within frames; short-term features miss longer-term patterns (overcome with delta/acceleration)
Not optimal for music; music features (spectral centroid, chroma, etc.) are often better
Sensitive to microphone characteristics and background noise (requires normalization)

Frequently asked

Why use the mel scale instead of Hz?

Human hearing perceives pitch logarithmically, not linearly. The mel scale reflects this: 1 mel corresponds to 1 just-noticeable change in pitch. Grouping frequencies by mel scale better captures perceptual differences.

What are delta and acceleration coefficients?

Delta coefficients capture the rate of change of MFCCs over time; acceleration coefficients capture the second derivative. Adding these to the static MFCCs significantly improves speech recognition (feature vector ~39 numbers instead of 13).

How do I choose the number of MFCC coefficients?

For speech, 12-13 static coefficients are standard. Including deltas and accelerations, use 39-40 total (13 × 3). For different domains or sample rates, experiment; more coefficients capture more detail but can overfit.

Sources

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357-366. DOI: 10.1109/TASSP.1980.1163420 ↗
Young, S. J., Evermann, G., Gales, M. J., et al. (1996). The HTK Book. Cambridge University Engineering Department. link ↗
Moustakides, G. V., & Rougui, J. A. (2004). Optimal filtering for polynomial signal models. IEEE Transactions on Signal Processing, 52(8), 2219-2230. link ↗

How to cite this page

ScholarGate. (2026, June 3). Mel-Frequency Cepstral Coefficients. ScholarGate. https://scholargate.app/en/applied-physics/mfcc

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

AmbisonicsApplied Physics↔ compare
Head-Related Transfer FunctionApplied Physics↔ compare
Independent Vector AnalysisApplied Physics↔ compare

Compare side by side →

Referenced by

Ambisonics Head-Related Transfer Function Independent Vector Analysis

Related reference concepts

Automatic Speech Recognition Speech Perception and Intelligibility Frequency, Intensity, and Loudness Perception Acoustic Cues and Formants Speech Perception Psychoacoustics and Auditory Perception

Spotted an issue on this page? Report or suggest a fix →

Process / pipelineAudio Signal Processing

MFCC (Mel-Frequency Cepstral Coefficients)

Mel-Frequency Cepstral Coefficients · Also known as: mel-cepstral features, MFCC features, mel-frequency features

Tools & resources

Download slides

Learn & explore

Read the full method

Members only

Method map

The neighbourhood of related methods — select a node to explore.

MFCC

Ambisonics Head-Related Transfer Fu…Independent Vector Analy…

When to use it

Strengths & limitations

Strengths

Compact representation (~13 coefficients per frame) reduces computational cost and storage
Aligned with human auditory perception; captures perceptually important information
Robust to speaker variability and moderate noise
Widely implemented in speech recognition frameworks (HTK, Kaldi, TensorFlow)

Limitations

Discards phase information; cannot reconstruct audio from MFCCs alone
Assumes stationarity within frames; short-term features miss longer-term patterns (overcome with delta/acceleration)
Not optimal for music; music features (spectral centroid, chroma, etc.) are often better
Sensitive to microphone characteristics and background noise (requires normalization)

Frequently asked

Why use the mel scale instead of Hz?

What are delta and acceleration coefficients?

How do I choose the number of MFCC coefficients?

Sources

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357-366. DOI: 10.1109/TASSP.1980.1163420 ↗
Young, S. J., Evermann, G., Gales, M. J., et al. (1996). The HTK Book. Cambridge University Engineering Department. link ↗
Moustakides, G. V., & Rougui, J. A. (2004). Optimal filtering for polynomial signal models. IEEE Transactions on Signal Processing, 52(8), 2219-2230. link ↗

How to cite this page

ScholarGate. (2026, June 3). Mel-Frequency Cepstral Coefficients. ScholarGate. https://scholargate.app/en/applied-physics/mfcc

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

AmbisonicsApplied Physics↔ compare
Head-Related Transfer FunctionApplied Physics↔ compare
Independent Vector AnalysisApplied Physics↔ compare

Compare side by side →

Referenced by

Ambisonics Head-Related Transfer Function Independent Vector Analysis

Related reference concepts

Automatic Speech Recognition Speech Perception and Intelligibility Frequency, Intensity, and Loudness Perception Acoustic Cues and Formants Speech Perception Psychoacoustics and Auditory Perception

Spotted an issue on this page? Report or suggest a fix →

MFCC (Mel-Frequency Cepstral Coefficients)

Read the full method

Method map

When to use it

Strengths & limitations

Frequently asked

Sources

How to cite this page

Which method?

Referenced by

Similar methods

Related reference concepts

MFCC (Mel-Frequency Cepstral Coefficients)

Read the full method

Method map

When to use it

Strengths & limitations

Frequently asked

Sources

How to cite this page

Which method?

Referenced by

Similar methods

Related reference concepts