Process / pipeline

自动文本评估 — BLEU, ROUGE, BERTScore

自动文本评估是一类基于参考的度量方法，用于通过将机器生成的文本（如翻译、摘要或自然语言生成（NLG）输出）与一个或多个人工编写的参考文本进行比较，来衡量其质量。该领域由 Papineni 等人在 2002 年以 BLEU 开创，现已发展到包括 n-gram 重叠度量（BLEU, ROUGE）和语义感知度量（BERTScore, MoverScore），这些度量能够捕捉超越表面词语匹配的含义。

在 MethodMind 中打开即将推出视频即将推出Download slides

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

Method map

The neighbourhood of related methods — select a node to explore.

自动文本评估

BERT 嵌入情感分析文本分类主题建模自然语言生成文本连贯性评分

来源

Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. Proceedings of ACL 2002. link ↗
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating Text Generation with BERT. Proceedings of ICLR 2020. link ↗

如何引用本页

ScholarGate. (2026, June 1). Automatic Text Evaluation (BLEU, ROUGE, BERTScore). ScholarGate. https://scholargate.app/zh/text-mining/automatic-text-evaluation

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

被引用于

自然语言生成文本连贯性评分

发现本页有问题？报告或提出修改建议 →