Why does Newton-Raphson converge so quickly near the optimum?

It fits a local quadratic model using both the slope and curvature of the objective, so each step lands very close to the true optimum, giving quadratic convergence. The trade-off is that it needs the Hessian and can be unstable far from the solution.

When is Fisher scoring preferred over plain Newton-Raphson?

Fisher scoring uses the expected information, which is often positive-definite and simpler to compute than the observed Hessian, making the iteration more stable. It is the standard method behind fitting generalized linear models.

Newton-Raphson and Scoring Methods

Newton-Raphson and related scoring methods maximize a likelihood by repeatedly taking steps based on the gradient and curvature of the log-likelihood, achieving fast local convergence near the optimum.

Etsi aihe työkalulla PaperMindTulossaFind papers & topics

Tools & resources

Lataa diat

Learn & explore

VideoTulossa

Definition

Newton-Raphson and scoring methods are iterative optimization algorithms that update a parameter estimate by solving a local quadratic model of the log-likelihood, using the gradient (score) and a Hessian or information matrix to determine the step.

Scope

This topic covers the Newton-Raphson iteration applied to the score equations, Fisher scoring which replaces the observed information by its expectation, quasi-Newton methods that approximate curvature from gradients, the role of step-size and line-search safeguards, and the link between curvature at the optimum and the asymptotic variance of the estimator.

Core questions

How does a local quadratic approximation produce the Newton step for the score equations?
How does Fisher scoring differ from Newton-Raphson, and why is it often preferred?
How do quasi-Newton methods approximate curvature without computing the Hessian?
How do line searches and modifications keep the iteration stable far from the optimum?

Key concepts

Score function
Hessian and information matrix
Quadratic convergence
Fisher scoring
Quasi-Newton update
Line search

Key theories

Newton iteration on the score: Treating maximum-likelihood estimation as solving the score equations, the Newton step uses the inverse Hessian times the gradient and converges quadratically once close to the maximum.
Fisher scoring and quasi-Newton: Replacing the observed information by the expected information gives Fisher scoring, which is often more stable, while quasi-Newton updates build a curvature approximation from successive gradients to avoid forming the Hessian directly.

Clinical relevance

Fisher scoring is the default fitting algorithm for generalized linear models through iteratively reweighted least squares, and Newton and quasi-Newton methods fit countless nonlinear statistical models; the curvature these methods compute also yields standard errors for the estimates.

History

The Newton-Raphson root-finding method predates statistics, but Fisher's introduction of scoring tied it to likelihood estimation; mid-twentieth-century numerical analysis added quasi-Newton methods, which together became the backbone of statistical model fitting.

Key figures

Isaac Newton
Joseph Raphson
Ronald A. Fisher
Jorge Nocedal

Seminal works

givens2013
nocedal2006

Frequently asked questions

Why does Newton-Raphson converge so quickly near the optimum?: It fits a local quadratic model using both the slope and curvature of the objective, so each step lands very close to the true optimum, giving quadratic convergence. The trade-off is that it needs the Hessian and can be unstable far from the solution.
When is Fisher scoring preferred over plain Newton-Raphson?: Fisher scoring uses the expected information, which is often positive-definite and simpler to compute than the observed Hessian, making the iteration more stable. It is the standard method behind fitting generalized linear models.