Which method should I use?
Describe your research situation in a few words; we surface the methods from the library that best fit your goal and data.
Recommendations for: reduce many correlated variables into a few latent factors or components
- Principal Component AnalysisMachine Learning
Principal Component Analysis (PCA) is an unsupervised dimensionality-reduction method — given its modern textbook treatment by Ian Jolliffe (2002) — that compresses high-dimensional data into fewer dimensions while preserving the maximum possible variance. It re-expresses correlated variables as a small set of uncorrelated principal components ordered by how much of the data's variation each one captures.
- Partial Least SquaresMachine Learning
Partial least squares regression predicts a response from many, often highly collinear predictors by projecting them onto a small set of latent components — but, unlike principal components regression, it chooses those components to maximize their covariance with the response, not just the variance of the predictors. This supervised dimension reduction makes PLS a workhorse in chemometrics, spectroscopy, and other wide-data settings where predictors vastly outnumber observations.
- Locally Linear EmbeddingMachine Learning
Locally linear embedding, introduced by Sam Roweis and Lawrence Saul in 2000, is a manifold-learning method for nonlinear dimensionality reduction. It assumes that although data may curve through a high-dimensional space, each point and its neighbours lie approximately on a flat patch. LLE captures each point as a weighted combination of its neighbours and then finds a low-dimensional layout that preserves those same local relationships, unrolling curved structure into a faithful low-dimensional map.
- EFAStatistics
Exploratory factor analysis reduces a large set of observed variables into a smaller number of latent common factors. It is widely used in scale development and psychometrics to uncover the dimensional structure that underlies a set of correlated items, without specifying that structure in advance.
- Principal Components RegressionMachine Learning
Principal components regression first compresses a set of correlated predictors into a few principal components — the directions of greatest variance — and then regresses the response on those components. By discarding low-variance directions, PCR stabilizes estimation in the presence of multicollinearity and high dimensionality, at the cost of choosing components without reference to the response.
- Bayesian Principal Component AnalysisStatistics
Bayesian principal component analysis embeds probabilistic PCA within a Bayesian framework, placing priors over the loading matrix so that irrelevant components are automatically pruned. It handles missing data naturally and provides principled uncertainty estimates for both the latent scores and the dimensionality of the representation.
Common question: which method?
For the most-asked situations, the methods the library surfaces.
Which method compares the means of two or more groups?
- Independent samples t-testStatistics
- Welch t-testStatistics
- Hotelling's T² TestStatistics
Which method predicts a continuous outcome from several variables?
- Multivariate RegressionStatistics
- Bayesian Multiple linear regressionStatistics
- Robust Multiple linear regressionStatistics
Which method classifies observations into categories?
- Grey ClusteringSoft Computing
- CNN Image ClassificationDeep Learning
- YOLODeep Learning
Which method groups similar observations without labels?
- K-Means ClusteringMachine Learning
- Hierarchical ClusteringMachine Learning
- Sentence EmbeddingsDeep Learning
Which method tests the association between two variables?
- Robust CorrelationStatistics
- Cramer's VStatistics
- Spearman CorrelationStatistics
Which method reduces many correlated variables to a few factors?
- Principal Component AnalysisMachine Learning
- Partial Least SquaresMachine Learning
- Locally Linear EmbeddingMachine Learning
Which method ranks alternatives across multiple criteria?
Refine this scenario →Which method analyzes time-to-event data with censoring?
- Weibull RegressionSurvival
- Kaplan-Meier EstimatorStatistics
- Royston-Parmar ModelSurvival