Process / pipeline

Automatic Text Evaluation — BLEU, ROUGE, BERTScore

Automatic text evaluation is a family of reference-based metrics used to measure the quality of machine-generated text — such as translations, summaries, or natural-language-generation (NLG) outputs — by comparing them to one or more human-written reference texts. Pioneered by Papineni et al. with BLEU in 2002, the field has grown to include n-gram overlap metrics (BLEU, ROUGE) and semantically aware metrics (BERTScore, MoverScore) that capture meaning beyond surface word matches.

MethodMind'de açSoonVideoSoon

Tam yöntemi oku

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. Proceedings of ACL 2002. link
  2. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating Text Generation with BERT. Proceedings of ICLR 2020. link

Related methods

Referenced by

ScholarGateAutomatic Text Evaluation (Automatic Text Evaluation (BLEU, ROUGE, BERTScore)). Retrieved 2026-06-04 from https://scholargate.app/tr/text-mining/automatic-text-evaluation