MCDMCluster Number Selection
Gap Statistic
The Gap Statistic, developed by Tibshirani, Walther, and Hastie in 2001, is a principled statistical method for determining the optimal number of clusters in a dataset. It compares the observed within-cluster sum of squares to the expected value under a null hypothesis of no clustering structure, providing a theoretically grounded approach to cluster number selection.
Open in MethodMindSoonVideoSoon
Read the full method
Members only
Sign inSign in with a free account to read this section.
Sources
- Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423. DOI: 10.1111/1467-9868.00293 ↗