Which method should I use?

Describe your research situation in a few words; we surface the methods from the library that best fit your goal and data.

Recommendations for: group similar observations into clusters without predefined labels

  1. K-Means ClusteringMachine Learning

    K-Means Clustering is a centroid-based partitional clustering algorithm, traced to J. MacQueen in 1967, that splits data into k clusters by assigning each observation to its nearest cluster centre. It is widely used for marketing segmentation, customer grouping, and exploratory analysis.

  2. Hierarchical ClusteringMachine Learning

    Hierarchical clustering is an unsupervised method that groups observations into nested clusters and draws the result as a dendrogram, so the number of clusters need not be fixed in advance. Its agglomerative form rests on the objective-function grouping criterion introduced by Joe Ward in 1963.

  3. Sentence EmbeddingsDeep Learning

    Sentence Embeddings convert a sentence or short text into a single fixed-length dense vector that captures its semantic meaning. These vectors allow downstream tasks — semantic similarity, clustering, retrieval, and classification — to operate on numerical representations instead of raw text, making them one of the most versatile building blocks in modern NLP pipelines.

  4. Modularity AnalysisNetwork Analysis

    Modularity analysis is a network science method, formalized by Newman and Girvan in 2004, that detects community structure in graphs by measuring whether edges are more concentrated within groups than expected by chance. Its scalar quality index Q guides algorithms that partition nodes into cohesive clusters, making it the most widely adopted framework for community detection in social, biological, and technological networks.

  5. Semi-supervised K-meansMachine Learning

    Semi-supervised K-means extends standard K-means clustering by incorporating partial supervision — either a small set of labeled seed points or pairwise must-link and cannot-link constraints — to guide cluster formation. It bridges unsupervised clustering and fully supervised classification, enabling more meaningful clusters when labels are scarce but costly to obtain in full.

  6. Self-supervised DBSCANMachine Learning

    Self-supervised DBSCAN is a two-stage unsupervised pipeline that first trains a neural encoder on a pretext task — such as contrastive learning or masked reconstruction — to produce compact, semantically meaningful embeddings from unlabeled data, and then applies DBSCAN in the resulting embedding space to discover arbitrarily shaped clusters without requiring any class labels.

Common question: which method?

For the most-asked situations, the methods the library surfaces.

Which method compares the means of two or more groups?

Refine this scenario →

Which method predicts a continuous outcome from several variables?

Refine this scenario →

Which method classifies observations into categories?

Refine this scenario →

Which method groups similar observations without labels?

Refine this scenario →

Which method tests the association between two variables?

Refine this scenario →

Which method reduces many correlated variables to a few factors?

Refine this scenario →

Which method ranks alternatives across multiple criteria?

Refine this scenario →

Which method analyzes time-to-event data with censoring?

Refine this scenario →
Which method should I use? — ScholarGate