MCDMCluster Number Selection

Gap Statistic

The Gap Statistic, developed by Tibshirani, Walther, and Hastie in 2001, is a principled statistical method for determining the optimal number of clusters in a dataset. It compares the observed within-cluster sum of squares to the expected value under a null hypothesis of no clustering structure, providing a theoretically grounded approach to cluster number selection.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423. DOI: 10.1111/1467-9868.00293

Related methods

Referenced by

ScholarGateGap Statistic (Gap Statistic for Cluster Evaluation). Retrieved 2026-06-04 from https://scholargate.app/en/model-evaluation/gap-statistic