Model Selection and Information Criteria

Choosing among competing models

When comparing statistical models, relying on goodness-of-fit alone is misleading because more complex models always fit better. Information criteria balance how well a model fits the data against the number of parameters it uses, thereby guarding against overfitting. AIC and BIC operationalize this trade-off in different ways; lower values indicate a better model. Together they formalize the principle of parsimony, rewarding simpler explanations that still capture the signal in the data.

Concept and Logic

When two models are fitted to the same dataset, the model with more parameters almost always achieves a higher likelihood. Comparing models on likelihood or R-squared alone is therefore unfair. Information criteria address this by combining goodness-of-fit with a penalty for complexity to produce a single score. The AIC (Akaike Information Criterion) is defined as AIC = -2 x ln(L) + 2k, where L is the maximized likelihood and k is the number of estimated parameters. The BIC (Bayesian Information Criterion) incorporates sample size: BIC = -2 x ln(L) + k x ln(n). In both cases, lower values are preferred.

Difference Between AIC and BIC, and How to Read Them

AIC and BIC share the same principle but differ in their penalty terms. BIC applies a heavier penalty as sample size grows, so with large samples BIC tends to favor more parsimonious models than AIC does. For small samples, the corrected version AICc is recommended: AICc = AIC + 2k(k+1)/(n-k-1). When interpreting these criteria, differences between models matter more than absolute values. As a rule of thumb, a delta-AIC below 2 is negligible, 4-7 is moderate, and above 10 constitutes strong evidence. Adjusted R-squared and likelihood-ratio tests serve related roles but rest on different assumptions.

Common Misuses and Misconceptions

The most common error is comparing models fitted to different datasets or to differently transformed versions of the dependent variable (for example, log Y versus raw Y) using information criteria; this is invalid. A second misconception is treating a low BIC as evidence that the model is true; these criteria only assess relative fit among the candidate models under comparison. A third mistake is reporting raw AIC numbers without computing delta values. Finally, using AIC or BIC as an automated variable-selection routine combined with stepwise procedures can cause problems analogous to p-value inflation.

Why It Matters and How to Report

Information criteria have become a standard reporting element in studies that test multiple theoretical models on the same data, such as structural equation modeling, regression comparisons, and mixed models. When reporting AIC or BIC, good practice requires stating the software and version used, the criterion values for all candidate models, the delta values, and the rationale for the preferred model. Reporting only the winning model score is insufficient. These criteria move researchers away from arbitrary decisions and operationalize the data-supported principle of parsimony in a transparent and reproducible way.

Sources

  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723. DOI: 10.1109/TAC.1974.1100705