Learning to Rank
Learning to rank applies machine learning to build ranking functions that combine many features, training on labeled relevance data or user feedback to order documents better than a single hand-tuned formula.
Definition
Learning to rank is the use of machine-learning methods to induce a function that orders a set of documents for a query by relevance, trained from examples in which the relative or absolute relevance of documents is known, formulated as pointwise regression or classification, pairwise preference learning, or direct listwise optimization.
Scope
This topic covers supervised and feedback-driven approaches to learning ranking functions for retrieval. It addresses the pointwise, pairwise, and listwise formulations, the use of relevance labels and clickthrough data, representative methods such as RankNet and gradient-boosted ranking trees, and the optimization of rank-based metrics. It treats how a ranker is learned and evaluated as a model, while the assembly of features and the broader serving pipeline are covered under web search ranking.
Core questions
- How are ranking problems cast as pointwise, pairwise, or listwise learning?
- What training signals, such as relevance labels or clickthrough data, drive the learning?
- How can rank-based evaluation metrics, which are non-differentiable, be optimized?
- How are many heterogeneous features combined into a single learned ranker?
- How does click data introduce bias, and how can it be addressed?
Key concepts
- ranking function
- pointwise / pairwise / listwise learning
- relevance labels and graded relevance
- clickthrough and implicit feedback
- RankNet and gradient-boosted trees
- rank-based loss and metric optimization
- feature combination
- position bias
Key theories
- Pointwise, pairwise, and listwise formulations
- Ranking can be learned by predicting each document's relevance independently (pointwise), by learning correct orderings of document pairs (pairwise), or by optimizing a loss over whole result lists (listwise), with the latter aligning most directly with rank-based metrics.
- Learning from clickthrough data
- User clicks provide abundant but biased implicit relevance feedback; treating clicks as relative preferences within a result list lets ranking functions be trained from interaction logs rather than only expensive manual labels.
Clinical relevance
Learning to rank is the standard way modern search and recommendation systems combine signals, and machine-learned rankers based on gradient-boosted trees and neural models drive the result ordering of major web search engines, e-commerce search, and ad ranking.
History
As web search accumulated many ranking signals, hand-tuning became impractical, motivating machine-learned ranking. Joachims's 2002 work showed clickthrough data could train rankers; Burges and colleagues' RankNet (2005) introduced neural pairwise ranking and its descendants LambdaRank and LambdaMART; and Liu's 2009 survey consolidated the field around pointwise, pairwise, and listwise paradigms.
Key figures
- Tie-Yan Liu
- Christopher Burges
- Thorsten Joachims
Related topics
Seminal works
- liu2009
- burges2005
- joachims2002
Frequently asked questions
- What is the difference between pointwise, pairwise, and listwise learning to rank?
- Pointwise methods predict a relevance score for each document independently; pairwise methods learn which of two documents should rank higher; listwise methods optimize a loss defined over an entire ranked list. Listwise approaches align most closely with the list-level metrics users actually care about.
- Why use click data when it is biased?
- Clicks are far cheaper and more plentiful than manual relevance judgments, so they enable training at scale. The catch is position and presentation bias, which is why methods treat clicks as relative preferences and increasingly apply unbiased or counterfactual learning corrections.