Process / pipeline

TF-IDF — Term Frequency–Inverse Document Frequency

TF-IDF, introduced by Salton and Buckley (1988), is a term-weighting scheme that scores each word in a document by how often it appears there and how rare it is across the whole collection. It turns raw text into weighted document vectors, giving high weight to terms that are frequent in one document but uncommon elsewhere.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Salton, G. & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513-523. DOI: 10.1016/0306-4573(88)90021-0

Related methods

Referenced by

ScholarGateTF-IDF (Term Frequency–Inverse Document Frequency Vectorization). Retrieved 2026-06-04 from https://scholargate.app/en/text-mining/tf-idf