Parametric vs Non-parametric Tests
Which family of test, and when
Parametric tests (t-test, ANOVA, Pearson r) assume that the data follow a specific distributional form — typically normality — and operate on raw numerical values. Non-parametric tests (Mann–Whitney, Wilcoxon, Kruskal–Wallis, Spearman) are rank-based and impose far fewer distributional assumptions, making them suitable for small samples, ordinal data, or violated assumptions — at some cost in statistical power when parametric assumptions genuinely hold.
The Core Distinction: Distributional Assumptions and Raw Values
Parametric tests estimate population parameters using sample statistics such as the mean and standard deviation, and typically assume that the data come from a normally distributed population. For example, the independent-samples t-test presupposes approximate normality in both groups and homogeneity of variance. Non-parametric tests convert raw scores to ranks before analysis, and therefore carry no assumption about the shape of the underlying distribution. The definition of each family is thus directly tied to the nature of the assumptions imposed on the data.
How They Work: The Ranking Logic and Key Formulas
In parametric tests, quantities such as the standard error are central: SE = SD / √n. This formula expresses how much the sample mean is expected to vary across repeated samples and forms the basis of inferential reasoning. Non-parametric tests first rank all observations from smallest to largest, then perform calculations on these ranks rather than the original values. The Mann–Whitney U test compares the ranks of two groups; the Kruskal–Wallis test extends this to more than two groups; and the Wilcoxon signed-rank test considers both the sign and magnitude of paired differences. The result is a procedure that is largely insensitive to the shape of the distribution.
Common Misuses and Misconceptions
One of the most frequent misconceptions is that non-parametric tests are always the safer or simpler choice. In reality, when parametric assumptions genuinely hold, parametric tests have greater statistical power — meaning they are more likely to detect a true effect at the same sample size. Conversely, rigid rules such as 'large samples make parametric tests unnecessary' or 'Likert-scale data always require non-parametric tests' are also misleading. Sample size, effect size, and level of measurement must be considered together; no single rule covers all situations, and the decision should be grounded in the actual properties of the data.
Why It Matters in Research Practice
Choosing the correct family of tests is critical for both statistical validity and the interpretability of findings. A parametric test applied when its assumptions are violated can produce misleading p-values. Conversely, an unnecessarily chosen non-parametric test when assumptions are satisfied reduces power and increases the risk of missing real effects. Researchers are expected to first assess level of measurement (at least interval scale), distributional normality, and sample size, and then make decisions grounded in that evidence. Clearly reporting which test was used and why it was chosen is a fundamental requirement of scientific transparency.
Sources
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE. ISBN: 978-1-5264-1951-4