Does maximum likelihood always give an unbiased estimator?

No. Maximum likelihood estimators can be biased in finite samples, for example the maximum likelihood variance of a normal distribution; the bias typically vanishes as the sample grows.

Why maximize the log-likelihood instead of the likelihood?

The logarithm is increasing, so it has the same maximizer, but it turns products into sums, simplifying differentiation and improving numerical stability.

Maximum Likelihood Estimation

Maximum likelihood estimation chooses the parameter value under which the observed data are most probable, providing a general, asymptotically optimal recipe for estimation.

Definition

The maximum likelihood estimator is the value of the parameter that maximizes the likelihood function, that is, the probability or density of the observed data regarded as a function of the parameter.

Scope

This topic covers the likelihood and log-likelihood functions, the score equations and Fisher information, the existence and computation of maximum likelihood estimators, the invariance property under reparameterization, and the large-sample theory establishing consistency, asymptotic normality, and asymptotic efficiency, together with the regularity conditions these results require and common failures such as boundary and non-regular cases.

Core questions

How is the likelihood function defined, and why is it maximized rather than the probability of the parameter?
What are the score equations, and how does Fisher information enter the solution?
Under what regularity conditions is the maximum likelihood estimator consistent and asymptotically normal?
When does maximum likelihood fail, as in non-regular or boundary problems?

Key theories

Likelihood principle and the score: Inference is driven by the likelihood function; setting the score, its derivative, to zero gives the estimating equations whose solution is the maximum likelihood estimator.
Asymptotic efficiency: Under regularity conditions the maximum likelihood estimator is consistent, asymptotically normal with variance equal to the inverse Fisher information, and asymptotically efficient, attaining the Cramer-Rao bound in the limit.

Clinical relevance

Maximum likelihood is the default estimation engine for regression, generalized linear models, mixed models, survival analysis, and most probabilistic machine-learning models, where minimizing a negative log-likelihood loss is equivalent to maximizing likelihood.

History

Fisher formalized maximum likelihood and proved its efficiency in papers from 1912 through the 1920s. Wald gave rigorous consistency conditions in 1949, and Le Cam's mid-century work clarified the local asymptotic theory that underpins the modern efficiency results.

Key figures

Ronald A. Fisher
Abraham Wald
Lucien Le Cam
Aad van der Vaart

Seminal works

lehmannCasella1998

Frequently asked questions

Does maximum likelihood always give an unbiased estimator?: No. Maximum likelihood estimators can be biased in finite samples, for example the maximum likelihood variance of a normal distribution; the bias typically vanishes as the sample grows.
Why maximize the log-likelihood instead of the likelihood?: The logarithm is increasing, so it has the same maximizer, but it turns products into sums, simplifying differentiation and improving numerical stability.