Process / pipeline
词频分析 — 词语与 N-gram 计数
词频分析是一种描述性文本挖掘方法,它统计语料库中词语、N-gram 和短语的出现频率,以揭示内容模式和主要主题。它基于乔治·K·齐夫 (George K. Zipf, 1949) 形式化的词频分布洞见,即少数词语出现频率很高,而大多数词语出现频率很低。它是量化文本分析最基本、最广泛使用的入门方法之一。
阅读完整方法
仅限会员
登录使用免费账户登录即可阅读本节。
Method map
The neighbourhood of related methods — select a node to explore.
来源
- Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort. Addison-Wesley. link ↗
- Manning, C. D. & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press. ISBN: 9780262133609
如何引用本页
ScholarGate. (2026, June 1). Text Frequency Analysis (Word and N-gram Frequency Analysis). ScholarGate. https://scholargate.app/zh/text-mining/frequency-analysis-text
Which method?
Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.
Compare side by side →