Newton-Raphson and Scoring Methods
Newton-Raphson and related scoring methods maximize a likelihood by repeatedly taking steps based on the gradient and curvature of the log-likelihood, achieving fast local convergence near the optimum.
Definition
Newton-Raphson and scoring methods are iterative optimization algorithms that update a parameter estimate by solving a local quadratic model of the log-likelihood, using the gradient (score) and a Hessian or information matrix to determine the step.
Scope
This topic covers the Newton-Raphson iteration applied to the score equations, Fisher scoring which replaces the observed information by its expectation, quasi-Newton methods that approximate curvature from gradients, the role of step-size and line-search safeguards, and the link between curvature at the optimum and the asymptotic variance of the estimator.
Core questions
- How does a local quadratic approximation produce the Newton step for the score equations?
- How does Fisher scoring differ from Newton-Raphson, and why is it often preferred?
- How do quasi-Newton methods approximate curvature without computing the Hessian?
- How do line searches and modifications keep the iteration stable far from the optimum?
Key concepts
- Score function
- Hessian and information matrix
- Quadratic convergence
- Fisher scoring
- Quasi-Newton update
- Line search
Key theories
- Newton iteration on the score
- Treating maximum-likelihood estimation as solving the score equations, the Newton step uses the inverse Hessian times the gradient and converges quadratically once close to the maximum.
- Fisher scoring and quasi-Newton
- Replacing the observed information by the expected information gives Fisher scoring, which is often more stable, while quasi-Newton updates build a curvature approximation from successive gradients to avoid forming the Hessian directly.
Clinical relevance
Fisher scoring is the default fitting algorithm for generalized linear models through iteratively reweighted least squares, and Newton and quasi-Newton methods fit countless nonlinear statistical models; the curvature these methods compute also yields standard errors for the estimates.
History
The Newton-Raphson root-finding method predates statistics, but Fisher's introduction of scoring tied it to likelihood estimation; mid-twentieth-century numerical analysis added quasi-Newton methods, which together became the backbone of statistical model fitting.
Key figures
- Isaac Newton
- Joseph Raphson
- Ronald A. Fisher
- Jorge Nocedal
Related topics
Seminal works
- givens2013
- nocedal2006
Frequently asked questions
- Why does Newton-Raphson converge so quickly near the optimum?
- It fits a local quadratic model using both the slope and curvature of the objective, so each step lands very close to the true optimum, giving quadratic convergence. The trade-off is that it needs the Hessian and can be unstable far from the solution.
- When is Fisher scoring preferred over plain Newton-Raphson?
- Fisher scoring uses the expected information, which is often positive-definite and simpler to compute than the observed Hessian, making the iteration more stable. It is the standard method behind fitting generalized linear models.