Goodness-of-Fit and Model Error

How well a model fits the data

Goodness-of-fit measures quantify how closely a model reproduces observed data. For continuous outcomes, R-squared reports variance explained, while RMSE, MAE, and MAPE summarize prediction error in interpretable units. For models fit by likelihood, deviance and the chi-square goodness-of-fit test apply. Good fit on training data does not guarantee good prediction on new data — error must always be judged out of sample.

Core Concept: What Is Goodness-of-Fit?

A statistical model tries to summarize observed data through a set of parameters. Goodness-of-fit measures how much the model's predictions deviate from actual observations. For a continuous outcome, R-squared (R² = 1 − SS_res / SS_tot) is the most common measure; it ranges from 0 to 1 and indicates the proportion of variance in the dependent variable explained by the model. For categorical or count data, the chi-square statistic summarizes discrepancies between observed and expected frequencies: χ² = Σ (O − E)² / E. For likelihood-based models, deviance and information criteria such as AIC and BIC are used.

Prediction Error Metrics: RMSE, MAE, and MAPE

While R-squared works on a ratio scale, error-based measures express prediction inaccuracy in the original units of measurement. Root mean squared error RMSE = √(Σ(y − ŷ)² / n) penalizes large errors more heavily. Mean absolute error MAE = Σ|y − ŷ| / n weights each error equally and is more robust to outliers. Mean absolute percentage error MAPE = (100/n) Σ |(y − ŷ)/y| gives a relative percentage against actual values, aiding interpretation. Which metric to report depends on the research question and the measurement scale of the outcome.

Common Misconceptions

The most frequent misconception is that a high R-squared proves a model is correct or useful. R-squared only reflects fit on training data; when overfitting occurs, R-squared can be high while the model predicts new observations poorly. Conversely, a low R-squared does not mean a model is meaningless — in noisy behavioral data, a value of 0.20 can be practically valuable. In chi-square goodness-of-fit tests, the statistic grows mechanically as sample size increases, so almost any model will be rejected in large samples; effect size indices such as RMSEA and CFI should therefore be reported alongside the test statistic.

Reporting and Out-of-Sample Evaluation

Reliable reporting of goodness-of-fit requires presenting at least two complementary metrics and clearly stating which portion of the data was used for each calculation. In predictive work, a training-test split or cross-validation is mandatory; test-set error reflects true model performance. In structural equation modeling, χ², RMSEA, CFI, and SRMR are reported together — relying on a single index is insufficient. In explanatory modeling, fit indices may be less prominent than the direction and magnitude of coefficients, but they should still be examined to verify that model assumptions are met.