Process / pipeline
Automatic Text Evaluation — BLEU, ROUGE, BERTScore
Automatic text evaluation is a family of reference-based metrics used to measure the quality of machine-generated text — such as translations, summaries, or natural-language-generation (NLG) outputs — by comparing them to one or more human-written reference texts. Pioneered by Papineni et al. with BLEU in 2002, the field has grown to include n-gram overlap metrics (BLEU, ROUGE) and semantically aware metrics (BERTScore, MoverScore) that capture meaning beyond surface word matches.
MethodMind'de açSoonVideoSoon
Tam yöntemi oku
Members only
Sign inSign in with a free account to read this section.
Sources
- Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. Proceedings of ACL 2002. link ↗
- Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating Text Generation with BERT. Proceedings of ICLR 2020. link ↗