ScholarGate
Assistant

PageRank and HITS Algorithms

PageRank and HITS are link-analysis algorithms that estimate the importance or authority of web pages from the structure of the hyperlink graph rather than from page content alone.

Definition

PageRank assigns each page a query-independent importance score equal to its stationary probability under a random surfer who follows outgoing links with a damping probability and teleports otherwise, while HITS computes, for a query-dependent subgraph, authority scores for pages with valuable content and hub scores for pages that link to good authorities, each reinforcing the other.

Scope

This topic covers the two foundational link-analysis algorithms: PageRank, a query-independent measure of page importance based on a random-surfer model with teleportation, and HITS, a query-dependent scheme that computes mutually reinforcing hub and authority scores. It treats the underlying random-walk and eigenvector formulations, the role of damping and teleportation, convergence, and topic-sensitive variants. It excludes how these scores are combined with text features into a final ranking, which belongs to web search ranking.

Core questions

  • How does the random-surfer model define PageRank as a stationary distribution?
  • Why is a damping or teleportation factor needed for PageRank to be well defined?
  • How do hub and authority scores in HITS reinforce each other?
  • How are these scores computed iteratively, and when do they converge?
  • How do PageRank and HITS differ in being query-independent versus query-dependent?

Key concepts

  • PageRank
  • random-surfer model
  • damping factor and teleportation
  • stationary distribution / principal eigenvector
  • HITS
  • hubs and authorities
  • query-independent vs. query-dependent scoring
  • power iteration and convergence

Key theories

PageRank random-surfer model
Modeling a surfer who follows links with probability equal to a damping factor and otherwise jumps to a random page yields a Markov chain whose unique stationary distribution scores pages by long-run visitation, giving a stable, query-independent importance measure.
HITS hubs and authorities
Within a query-focused subgraph, HITS assigns each page an authority score proportional to the hub scores of pages linking to it and a hub score proportional to the authority scores of pages it links to, computed by iterating these mutually reinforcing relations to convergence.

Clinical relevance

PageRank was central to the rise of large-scale web search and remains a conceptual reference point for importance and authority on graphs. Link-analysis ideas generalize beyond the web to citation networks, social-network influence, fraud detection, and recommendation, wherever endorsement-like links carry information.

History

In 1998 and 1999, two near-simultaneous proposals showed that hyperlinks encode authority: Kleinberg's HITS and Page and Brin's PageRank. PageRank's query-independent, globally computed scores proved well suited to web-scale serving and underpinned a leading search engine, while HITS influenced topic-distillation and later graph-ranking research.

Key figures

  • Larry Page
  • Sergey Brin
  • Jon Kleinberg
  • Rajeev Motwani

Related topics

Seminal works

  • page1999
  • kleinberg1999
  • brin1998

Frequently asked questions

What is the difference between PageRank and HITS?
PageRank computes a single query-independent importance score for every page from the whole web graph in advance, whereas HITS computes query-dependent hub and authority scores on a small subgraph built around the query at retrieval time. PageRank scales more readily to web serving.
Why does PageRank need a damping factor?
Without it, the random surfer could get trapped in pages with no outgoing links or in cycles, and the stationary distribution might not be unique or well defined. The damping (teleportation) probability lets the surfer occasionally jump to a random page, guaranteeing a unique, well-behaved score.

Methods for this concept

Related concepts