What is the difference between an M-estimator and a Z-estimator?

An M-estimator maximizes a sample objective function, while a Z-estimator solves a system of estimating equations; when the objective is differentiable the two coincide, since the maximizer is a root of the gradient.

Why does empirical-process theory matter for machine learning?

Uniform limit theorems over function classes bound how far empirical error can stray from true error across all candidate models, which is exactly what generalization guarantees require.

M-Estimation and Empirical Processes

M-estimation treats estimators defined by optimizing a sample criterion as a single family, and empirical-process theory supplies the uniform limit theorems needed to analyze them.

Find Topic with PaperMindSoonFind papers & topics

Tools & resources

Download slides

Learn & explore

VideoSoon

Definition

An M-estimator is the maximizer of a sample average of a criterion function, and a Z-estimator the root of a sample average of an estimating function; the empirical process is the rescaled difference between the empirical and true distributions, indexed by a class of functions.

Scope

This topic covers M-estimators that maximize an objective and Z-estimators that solve estimating equations, the unification of maximum likelihood, least squares, quantile, and robust estimators, the consistency and asymptotic normality of M-estimators via uniform convergence, the empirical distribution and the empirical process, weak convergence to a Gaussian process, Glivenko-Cantelli and Donsker classes, and entropy and bracketing conditions that control complexity.

Core questions

How do M- and Z-estimation unify maximum likelihood, least squares, and robust estimators?
What uniform convergence is needed to prove consistency and asymptotic normality of an M-estimator?
When does the empirical process converge weakly to a Gaussian process, that is, when is a class Donsker?
How do entropy and bracketing conditions control the complexity of a function class?

Key theories

M- and Z-estimation: Estimators defined by optimizing or by setting a sample average to zero share a common asymptotic analysis: a uniform law of large numbers gives consistency and a linearization gives asymptotic normality with a sandwich variance.
Empirical-process weak convergence: Over a Donsker class of functions the empirical process converges weakly to a Gaussian process, generalizing the central limit theorem from a single statistic to a whole function class and underpinning modern asymptotics.

Clinical relevance

M-estimation gives the sandwich, or robust, standard errors used when a model may be misspecified, and empirical-process theory provides the theoretical guarantees behind generalization bounds in statistical learning, connecting classical statistics to machine learning.

History

Huber introduced M-estimation for robust statistics in 1964. The empirical-process program, advanced by Dudley, Pollard, and others through the 1970s and 1980s and synthesized in van der Vaart and Wellner's 1996 monograph, supplied the uniform limit theory now standard in asymptotics.

Key figures

Peter J. Huber
Aad van der Vaart
Richard M. Dudley
Jon A. Wellner

Seminal works

vanderVaart1998

Frequently asked questions

What is the difference between an M-estimator and a Z-estimator?: An M-estimator maximizes a sample objective function, while a Z-estimator solves a system of estimating equations; when the objective is differentiable the two coincide, since the maximizer is a root of the gradient.
Why does empirical-process theory matter for machine learning?: Uniform limit theorems over function classes bound how far empirical error can stray from true error across all candidate models, which is exactly what generalization guarantees require.