Compare methods
Review your selected methods side by side; rows that differ are highlighted.
| Normalized Discounted Cumulative Gain (nDCG)× | BM25 Probabilistic Ranking (Okapi)× | Citation Context and Sentiment Analysis× | Mean Average Precision (MAP)× | |
|---|---|---|---|---|
| Field | Bibliometrics | Bibliometrics | Bibliometrics | Bibliometrics |
| Family | Process / pipeline | Process / pipeline | Process / pipeline | Process / pipeline |
| Year of origin≠ | 2002 | 2009 | 2006 | 2000 |
| Originator≠ | Kalervo Järvelin & Jaana Kekäläinen | Stephen Robertson; Karen Spärck Jones; Hugo Zaragoza (Okapi team, City University London) | Simone Teufel, Advaith Siddharthan & Dan Tidhar (citation function); Awais Athar (citation sentiment) | TREC / information-retrieval evaluation community; Chris Buckley & Ellen Voorhees (stability analysis) |
| Type≠ | Graded-relevance ranking-evaluation pipeline with position discounting and normalization | Probabilistic term-weighting and document-scoring pipeline for ranked retrieval | NLP pipeline for classifying the rhetorical function and polarity of citations | Binary-relevance ranked-retrieval evaluation pipeline |
| Seminal source≠ | Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422-446. DOI ↗ | Robertson, S., & Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, 3(4), 333-389. DOI ↗ | Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), 103-110. link ↗ | Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '00), 33-40. DOI ↗ |
| Aliases | nDCG, Discounted Cumulative Gain, DCG/IDCG Normalization, Cumulated Gain Evaluation | Okapi BM25, Best Matching 25, Probabilistic Relevance Ranking, BM25 Term Weighting | Citation Function Classification, Citation Polarity Analysis, Citation Sentiment Detection, Citation Context Mining | MAP, Average Precision, AP, Mean AP |
| Related | 3 | 3 | 3 | 3 |
| Summary≠ | Normalized Discounted Cumulative Gain (nDCG) is the standard metric for evaluating ranked retrieval and recommendation when relevance comes in grades rather than a simple relevant/non-relevant binary. Introduced by Kalervo Järvelin and Jaana Kekäläinen in their 2002 ACM Transactions on Information Systems paper on cumulated gain-based evaluation, nDCG rewards a system for placing highly relevant documents near the top of the ranking. It accumulates the graded relevance ('gain') of each retrieved item, discounts that gain by how far down the list the item sits, and normalizes the total against the best possible ordering so that scores fall on a comparable 0-to-1 scale across queries. Because it handles multi-level relevance and is rank-sensitive, nDCG has become the dominant effectiveness measure for web search, learning-to-rank, and academic-search evaluation. | BM25, the Okapi 'Best Matching 25' function, is the dominant classical ranking function in information retrieval and the workhorse term-weighting scheme behind most lexical search engines and bibliographic databases. Developed by Stephen Robertson, Karen Spärck Jones and colleagues at City University London and formalized in Robertson and Zaragoza's 2009 monograph on the Probabilistic Relevance Framework, BM25 scores a document against a query as a sum, over query terms, of inverse-document-frequency weights multiplied by a saturating, length-normalized transform of within-document term frequency. Two free parameters control how quickly repeated terms stop adding evidence (k1) and how strongly document length is penalized (b). BM25 consistently outperformed plain TF-IDF in the TREC evaluations and remains the standard first-stage retrieval baseline against which modern neural rankers are measured. | Citation context and sentiment analysis is the scientometric text-mining technique that reads the words around a citation to recover why one paper cites another and with what attitude. Standard citation counting treats every citation as an equal, polarity-free vote, but Simone Teufel, Advaith Siddharthan and Dan Tidhar's 2006 EMNLP work showed that citations serve distinct rhetorical functions — using a method, contrasting with prior work, acknowledging a basis, or merely referencing in passing — and that these functions can be classified automatically from the citing sentence. Awais Athar's 2011 work extended this to sentiment, distinguishing positive, neutral, and negative (critical) citations using sentence-structure features. Together these methods turn the raw citation graph into a typed, sentiment-bearing graph, enabling more meaningful impact measures, better citation indexers, and summaries of how a paper has been received. | Mean Average Precision (MAP) is the classic single-number summary of ranked-retrieval effectiveness under binary relevance and the headline metric of the TREC ad hoc retrieval tracks. For a single query, average precision (AP) computes the precision of the result list at each rank where a relevant document appears and averages those values, rewarding systems that rank all relevant documents highly; MAP is then the mean of AP across a set of queries. Buckley and Voorhees's 2000 SIGIR analysis of evaluation-measure stability showed that average precision is among the most stable and discriminating IR measures, requiring fewer queries than alternatives like precision at a fixed cutoff to reliably tell two systems apart. MAP remains a standard reporting metric for ranked retrieval, complementing graded-relevance measures such as nDCG. |
| ScholarGateDataset ↗ |
|
|
|
|