Glossary
Research, statistics and methods terms — concise, sourced definitions that complement the full method and topic pages.
A
- Admissible decision ruleIn statistical decision theory, a decision rule is admissible if no other rule dominates it, meaning no alternative rule has uniformly smaller or equal risk for all parameter values with strictly smaller risk for at least one. Inadmissible rules are suboptimal and can always be replaced by a superior rule.
- Algebra of random variablesThe mathematical framework defining arithmetic operations—addition, subtraction, multiplication, and division—on random variables, where results are themselves random variables. It encompasses rules for computing expectations, variances, and distributions of derived variables, forming the basis for probability theory and statistical modeling.
- Alternative hypothesisIn hypothesis testing, the hypothesis accepted when the null hypothesis is rejected; it represents the claim the researcher seeks to support. Denoted H₁ or Hₐ, it may be one-sided (e.g., μ > μ₀) or two-sided (μ ≠ μ₀), and is never directly tested but supported by evidence against the null.
- Analysis of varianceA statistical method that partitions total variability in a response variable into components attributable to different sources—between-group and within-group variation—to test whether group means differ significantly. It uses the F-statistic and was developed by R.A. Fisher. Commonly abbreviated as ANOVA.
- Atomic eventA single-element subset of a sample space that cannot be decomposed further; also called an elementary or simple event. For example, rolling a '3' on a die is an atomic event. All atomic events are mutually exclusive and collectively exhaustive, forming the finest-grained partition of the sample space.
B
- Bar chartA graph representing categorical data with rectangular bars whose lengths are proportional to the frequencies, counts, or other quantities they represent. Bars may be oriented vertically or horizontally and are separated by gaps to emphasize the discrete nature of the categories. Widely used for comparative visualization.
- Bayes estimatorA statistical estimator that minimizes the expected posterior loss under a specified loss function. Under squared error loss, it equals the posterior mean of the parameter. It combines prior information with observed data through Bayes' theorem and is optimal in the Bayesian decision-theoretic sense.
- Bayes factorA ratio quantifying the relative evidence provided by data for one hypothesis over another: BF₁₀ = P(data|H₁) / P(data|H₀), computed as the ratio of marginal likelihoods. Values greater than one favor H₁; it is used in Bayesian model comparison as an alternative to p-values and automatically penalizes model complexity.
- Bayes' theoremA fundamental theorem relating conditional probabilities: P(A|B) = P(B|A)·P(A) / P(B). It provides a principled way to update the probability of a hypothesis given new evidence, combining a prior probability with a likelihood to yield a posterior probability. It is the cornerstone of Bayesian inference.
- Bayesian inferenceA statistical inference framework that represents uncertainty about parameters as probability distributions and updates a prior distribution to a posterior distribution using Bayes' theorem after observing data. It provides a coherent approach to parameter estimation, model comparison, and prediction, treating parameters as random rather than fixed quantities.
- BiasThe systematic difference between the expected value of an estimator and the true parameter value: Bias(θ̂) = E(θ̂) − θ. An estimator with zero bias is called unbiased. Bias can arise from poor sampling design, model misspecification, or measurement error, and is distinct from random sampling variability (variance).
- Binary dataData that can take only two possible values, typically coded as 0/1, yes/no, or success/failure. Also called dichotomous data. Binary outcomes are modeled by Bernoulli and binomial distributions and analyzed using logistic regression, probit models, and chi-squared tests, making them fundamental to categorical data analysis.
- Binomial distributionThe probability distribution of the number of successes in n independent Bernoulli trials, each with success probability p. Its probability mass function is P(X=k) = C(n,k)pᵏ(1−p)ⁿ⁻ᵏ, with mean np and variance np(1−p). It is fundamental in quality control, clinical trials, and survey sampling.
- Bivariate analysisStatistical analysis that simultaneously examines the relationship between exactly two variables. Common methods include correlation coefficients, cross-tabulations, simple linear regression, and two-sample t-tests. Bivariate analysis reveals the direction, strength, and form of association between variables, serving as a bridge between univariate description and multivariate modeling.
- BlockingAn experimental design technique in which experimental units are grouped into homogeneous blocks to reduce variability from known nuisance factors. Treatments are randomly assigned within blocks, allowing between-block variability to be separated from experimental error, thereby increasing the precision of treatment comparisons.
- Box plotA graphical display summarizing a dataset's distribution using five statistics: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. The box spans the interquartile range (IQR), whiskers extend to non-outlier extremes, and points beyond indicate outliers. Developed by John Tukey for exploratory data analysis.
- Box-Jenkins methodA systematic iterative approach to time series modeling developed by George Box and Gwilym Jenkins, consisting of three stages: identification of an appropriate ARIMA model using ACF/PACF plots, parameter estimation, and diagnostic checking. Its seasonal extension, SARIMA, handles periodic patterns in data.
C
- Causal studyA study designed to establish whether changes in one variable cause changes in another. Randomized controlled experiments (RCTs) are the gold standard for causal inference; observational causal studies require rigorous control of confounding through methods such as instrumental variables, propensity score matching, or difference-in-differences.
- Central limit theoremA fundamental theorem stating that the standardized sample mean of n independent, identically distributed random variables with finite mean and variance converges in distribution to the standard normal as n → ∞, regardless of the underlying distribution. It provides the theoretical foundation for most frequentist inferential procedures.
- Central momentThe r-th order expected value of deviations from the mean: μᵣ = E[(X − μ)ʳ]. The first central moment is zero; the second is the variance; the third and fourth (standardized) yield skewness and kurtosis, respectively. Central moments characterize the shape of a distribution and are invariant to location shifts.
- Characteristic functionThe Fourier transform of a probability distribution, defined as φ_X(t) = E[e^{itX}]. It exists for all distributions and uniquely determines the distribution. Unlike the moment generating function, it always exists for all real t. It is especially useful for deriving distributions of sums of independent random variables and proving limit theorems.
- Chi-squared distributionThe distribution of the sum of squares of k independent standard normal random variables, parameterized by k degrees of freedom. It has mean k and variance 2k, is non-negative and right-skewed. It underlies chi-squared goodness-of-fit and independence tests, and is used in inference about population variances.
- Chi-squared testA statistical test that uses the chi-squared distribution to assess whether observed frequencies deviate from expected frequencies. Its two principal forms are: (1) goodness-of-fit test, assessing whether a sample matches a hypothesized distribution; and (2) test of independence, evaluating association between two categorical variables in a contingency table.
- Cluster analysisAn exploratory multivariate technique that partitions observations into groups (clusters) such that objects within a cluster are more similar to each other than to those in other clusters. Common algorithms include k-means, hierarchical clustering, and DBSCAN. It is an unsupervised method with no predefined class labels.
- Cluster samplingA probability sampling method in which the population is divided into naturally occurring groups (clusters), a random sample of clusters is selected, and all units within chosen clusters (or a subsample thereof) are observed. It reduces cost in large geographically dispersed populations but may be less statistically efficient than simple random sampling.
- Complementary eventThe complement of an event A, denoted Aᶜ or Ā, consists of all outcomes in the sample space not in A, satisfying P(Aᶜ) = 1 − P(A). Event A and its complement are mutually exclusive and exhaustive. The complement rule is frequently used to simplify probability calculations by solving for the easier complementary case.
- Completely randomized designThe simplest experimental design in which experimental units are assigned to treatments completely at random, with no blocking or restriction. It assumes homogeneity of experimental units and is analyzed by one-way ANOVA. While easy to implement and analyze, it is less efficient than blocked designs when unit heterogeneity is present.
- Computational statisticsThe discipline applying computationally intensive algorithms to statistical problems where analytical solutions are unavailable or inadequate. Key methods include bootstrap resampling, Markov chain Monte Carlo (MCMC), permutation tests, the EM algorithm, and cross-validation. It bridges classical statistics and modern machine learning.
- Conditional distributionThe probability distribution of one or more random variables given that other variables are fixed at specific values. Obtained by normalizing the joint distribution over the conditioning event; it captures how knowledge of some variables reshapes the distributional behavior of the remaining variables.
- Conditional probabilityThe probability of event A occurring given that event B has occurred, defined as P(A|B) = P(A∩B)/P(B) for P(B) > 0. It formalizes updated uncertainty in light of new information and forms the foundation of Bayes' theorem and probabilistic reasoning.
- Confidence intervalAn interval estimate computed from sample data that is expected to contain the true population parameter with a specified confidence level. A (1−α)×100% confidence interval means that if the procedure were repeated many times, that percentage of intervals would capture the true parameter value.
- Confidence levelThe long-run proportion of confidence intervals, constructed by a given procedure, that contain the true population parameter. Expressed as (1−α), typically 0.90, 0.95, or 0.99. A higher confidence level produces wider intervals, reflecting the trade-off between precision and coverage probability.
- ConfounderA variable that is associated with both the exposure (independent variable) and the outcome (dependent variable), thereby distorting the estimated causal relationship between them. Uncontrolled confounding can inflate, attenuate, or reverse an observed association, threatening internal validity.
- Conjugate priorA prior distribution that, when combined with a given likelihood function, yields a posterior distribution belonging to the same parametric family. For example, the Beta distribution is conjugate to the Binomial likelihood. Conjugacy enables closed-form Bayesian updating, greatly simplifying posterior computation.
- Continuous variableA variable that can take any value in an interval of the real line, with probabilities defined by a probability density function rather than a mass function. Examples include height, weight, and temperature. The probability of any single exact value is zero; probabilities are assigned only over intervals.
- Convenience samplingA non-probability sampling method in which participants are selected based on their ready availability to the researcher. While convenient and inexpensive, it is highly susceptible to selection bias, making it difficult or impossible to generalize findings to a broader population with statistical justification.
- CorrelationA standardized measure of the direction and strength of the linear relationship between two variables, ranging from −1 to +1. The Pearson correlation coefficient is r = Cov(X,Y)/(σ_X σ_Y). A value of zero indicates no linear association; correlation does not imply causation.
- Count dataData consisting of non-negative integers recording how many times an event occurs in a fixed interval of time or space. Count data are typically modeled with Poisson or Negative Binomial distributions. Examples include the number of hospital visits, accidents at an intersection, or defects per batch.
- CovarianceThe expected value of the product of the deviations of two random variables from their respective means: Cov(X,Y) = E[(X−μ_X)(Y−μ_Y)]. Positive covariance indicates that the variables tend to move together; negative, that they move oppositely. Because it is scale-dependent, the correlation coefficient is often preferred.
D
- Data analysisThe systematic process of cleaning, transforming, modeling, and interpreting raw data to discover patterns, test hypotheses, and support decision-making. Encompasses descriptive, exploratory, and confirmatory phases, and draws on statistical methods, computational tools, and domain knowledge to extract meaningful insights.
- Data setA structured collection of related observations or measurements assembled for a specific research or analytical purpose. Conventionally organized with rows representing individual units of observation and columns representing variables. Proper documentation of the unit of analysis, variable types, and measurement scales is essential for valid analysis.
- Decision ruleA rule that partitions the sample space into a rejection region and a non-rejection region, specifying when to reject the null hypothesis in a statistical test or which action to take in a decision problem. Formally determined by the significance level α, the test statistic distribution, and whether the test is one- or two-tailed.
- Decision theoryA mathematical and statistical framework for rational decision-making under uncertainty. It formalizes choices among actions using loss functions, risk functions, Bayes risk, and minimax criteria. Decision theory unifies estimation and hypothesis testing as special cases and provides the foundation for Bayesian and frequentist optimal inference.
- Degrees of freedomThe number of independent pieces of information available to estimate a parameter or quantity, typically calculated as sample size minus the number of estimated parameters. Degrees of freedom determine the shape of t, chi-squared, and F distributions; they decrease with each constraint imposed on the data.
- Density estimationThe process of estimating the underlying probability density function of a random variable from observed data. Parametric methods assume a specific distributional form; nonparametric methods such as kernel density estimation impose minimal assumptions. The histogram is the simplest nonparametric density estimator.
- Dependent variableThe outcome variable that the researcher seeks to explain or predict, modeled as a function of one or more independent variables. Denoted Y in regression; in experimental design it is the variable on which the effect of manipulation is measured. Also called the response variable or outcome variable.
- Descriptive statisticsThe branch of statistics concerned with summarizing the main features of a data set through numerical measures and graphical displays. Encompasses measures of central tendency (mean, median, mode), dispersion (variance, standard deviation, range), and shape (skewness, kurtosis). Does not involve inference to a broader population.
- Design of experimentsThe systematic planning of experiments to efficiently and validly estimate the effects of independent variables while controlling for confounding. Fundamental principles include randomization, replication, and blocking. Common designs include completely randomized, factorial, randomized complete block, and split-plot designs.
- DeviationThe difference between an observed value and a reference point, most commonly the arithmetic mean: d_i = x_i − x̄. Deviations are the building blocks of variance and standard deviation. Their signed sum equals zero; absolute or squared deviations form the basis of mean absolute deviation and variance respectively.
- Discrete variableA variable that can take only a countable number of distinct values, with probabilities defined by a probability mass function. Examples include the outcome of a die roll and the number of customers arriving per hour. The Binomial, Poisson, and Geometric distributions are standard models for discrete random variables.
- Dot plotA simple graphical display in which each observation is represented as a dot along a numerical axis. Identical values are stacked or jittered. Dot plots are effective for small to moderate data sets, clearly showing the distribution of individual observations, clusters, gaps, and outliers without loss of individual data points.
E
- Elementary eventAn event consisting of exactly one outcome from the sample space; it cannot be decomposed into simpler events. The sample space is the union of all elementary events, which are mutually exclusive and exhaustive. For a die roll, {3} is an elementary event; compound events are unions of one or more elementary events.
- Estimation theoryThe branch of statistics concerned with inferring unknown population parameters from observed data. Encompasses point estimation, interval estimation, least squares, maximum likelihood, and Bayesian estimation. Key properties used to evaluate estimators include unbiasedness, consistency, efficiency, and sufficiency.
- EstimatorA statistic — a function of the sample data — used to estimate an unknown population parameter. As a random variable it has a sampling distribution. The sample mean is a common estimator of the population mean. The estimator is the rule or formula; an estimate is the specific numerical value computed from a realized sample.
- Expected valueThe probability-weighted average of all possible values of a random variable, denoted E[X]. Computed as Σ x·P(X=x) for discrete variables and ∫ x·f(x)dx for continuous variables. It serves as the distribution's center of mass, and the expectation operator is linear: E[aX+bY] = aE[X]+bE[Y].
- Exponential familyA class of probability distributions whose density or mass function can be written as f(x|θ) = h(x)·exp[η(θ)·T(x) − A(θ)]. Includes the Normal, Binomial, Poisson, Gamma, and Bernoulli distributions. The family admits sufficient statistics, conjugate priors, and a unified information-geometric structure, making it central to statistical theory.
F
- Factor analysisA multivariate statistical method that explains the correlation structure among observed variables in terms of a smaller number of unobserved latent variables called factors. It takes two main forms — exploratory (EFA) and confirmatory (CFA) — and is widely used in scale development and dimensionality reduction.
- Factorial experimentAn experimental design in which all possible combinations of levels for two or more factors are studied simultaneously. It allows estimation of main effects as well as interaction effects between factors, and comes in full factorial and fractional factorial variants.
- Frequency distributionA tabular or graphical summary showing how often each distinct value or class interval occurs in a data set. It can be expressed as absolute frequencies, relative (proportional) frequencies, or cumulative frequencies, and serves as a fundamental tool for data summarization.
- Frequency domainThe analytical framework in which a signal or time series is represented in terms of its frequency components rather than time. Obtained via the Fourier transform, it is used to examine periodic patterns, spectral density, and filter behavior in time-series and signal analysis.
- Frequentist inferenceThe statistical inference paradigm that interprets probability as the long-run relative frequency of outcomes in repeated experiments. Parameters are treated as fixed but unknown quantities; hypothesis testing, confidence intervals, and maximum likelihood estimation are its principal tools.
G
- General linear modelA multivariate linear model relating one or more continuous dependent variables to continuous and categorical predictors under the assumption of normally distributed errors. It unifies ANOVA, ANCOVA, and multiple regression within a single framework, written in matrix form as Y = XB + E.
- Generalized linear modelA flexible extension of the linear model in which the response variable may follow any distribution from the exponential family, with the linear predictor linked to the mean via a link function. Logistic, Poisson, and gamma regression are all special cases of this framework.
- Grouped dataData reduced to class intervals and corresponding frequency counts rather than individual observations. Summary statistics such as the mean and variance are approximated using class midpoints; commonly used in large-sample summaries and historical statistical computation.
H
I
- IndependenceThe condition in which two events or random variables have no influence on each other's probability distribution. Formally expressed as P(A ∩ B) = P(A)P(B) for events; a foundational assumption underlying many statistical methods including the t-test, ANOVA, and ordinary least squares regression.
- Independent variableA variable used to explain or predict the dependent variable in a statistical model; it is either manipulated by the researcher or observed as a naturally varying quantity. Also called a predictor, covariate, or explanatory variable in regression, and a factor in experimental design.
- Interquartile rangeThe difference between the third quartile (Q3) and the first quartile (Q1) of a data set: IQR = Q3 − Q1. It measures the spread of the middle 50% of the data, is robust to outliers, and is widely used in box plots and outlier detection procedures.
J
- Joint distributionA probability distribution that simultaneously characterizes the values of two or more random variables. Expressed as a joint probability density function for continuous variables or a joint probability mass function for discrete variables; it is the basis for deriving marginal and conditional distributions.
- Joint probabilityThe probability that two or more events occur simultaneously, denoted P(A ∩ B) or P(A, B). For independent events, P(A ∩ B) = P(A)P(B); for dependent events, the multiplication rule gives P(A ∩ B) = P(A)P(B|A). It is the foundation for joint distributions of random variables.
K
- Kalman filterAn optimal recursive Bayesian filter that estimates the hidden state of a linear dynamical system from noisy measurements. It operates in two steps — prediction and update — and is widely applied in state-space models, navigation systems, signal processing, and time-series smoothing.
- Kernel density estimationA nonparametric method for estimating the probability density function of a random variable from data without assuming a parametric form. A kernel function (commonly Gaussian) is centered at each observation and the results are summed and normalized; bandwidth selection controls the degree of smoothing.
- KurtosisThe fourth standardized moment of a distribution, measuring the heaviness of its tails and the sharpness of its peak, computed as μ₄/σ⁴. A value of 3 corresponds to the normal distribution; values above 3 indicate heavy tails (leptokurtic) and values below 3 indicate light tails (platykurtic).
L
- L-momentSummary statistics based on linear combinations of order statistics, introduced by J. R. M. Hosking. They are more robust to outliers than conventional moments and are widely used in extreme-value analysis, hydrology, and climatology for estimating distribution parameters from small or heavy-tailed samples.
- Law of large numbersA fundamental theorem of probability stating that the sample mean of independent, identically distributed random variables converges to the true population mean as sample size grows. The weak form guarantees convergence in probability; the strong form guarantees almost-sure convergence.
- Likelihood functionA function of the model parameters, with observed data fixed, giving the probability of the data under those parameters: L(θ|x) = P(x|θ). The maximum likelihood estimator is the parameter value that maximizes this function. It is central to both frequentist and Bayesian inference.
- Likelihood-ratio testA hypothesis test comparing the likelihoods of two nested models to assess whether the restrictions imposed by the null model are supported by the data. The test statistic Λ = −2 ln(L₀/L₁) converges to a chi-squared distribution in large samples; the Neyman-Pearson lemma establishes it as the most powerful test.
- Loss functionIn statistical decision theory, a function measuring the cost of estimating a parameter with a value that deviates from the truth. Common examples include squared error, absolute error, and 0-1 loss. Bayes estimators are derived by minimizing the expected loss, known as the Bayes risk.
M
- M-estimatorA broad class of estimators defined by minimizing or maximizing a sum of a chosen objective (loss or score) function, formalized by Peter Huber. Maximum likelihood and least squares are special cases; non-quadratic loss functions yield robust estimators resistant to outliers and heavy-tailed errors.
- Marginal distributionThe probability distribution of a subset of variables obtained from a joint distribution by summing (discrete case) or integrating (continuous case) over all possible values of the remaining variables. It describes the behavior of one variable without conditioning on the others.
- Marginal likelihoodIn Bayesian statistics, the integral of the likelihood weighted by the prior over all parameter values: p(x) = ∫ p(x|θ)p(θ)dθ, also called the model evidence. It serves as the normalizing constant of the posterior and is used in model comparison via Bayes factors.
- Marginal probabilityThe probability of a single event or variable in a joint probability distribution, considered without reference to the values of the other variables. Obtained by summing or integrating the joint distribution over all values of the other variables: P(A) = ΣB P(A, B).
- Markov chain Monte CarloA family of algorithms that draw samples from high-dimensional probability distributions by constructing a Markov chain whose stationary distribution is the target distribution. The Metropolis-Hastings algorithm and Gibbs sampler are principal examples; MCMC is the dominant computational tool in Bayesian posterior inference.
- Mathematical statisticsThe discipline that provides the mathematical foundations of statistical methods, encompassing estimation theory, hypothesis testing, sufficiency, completeness, and information theory. Built on probability theory, it rigorously studies the properties of estimators and inference procedures using formal mathematical proof.
- Maximum likelihood estimationA statistical estimation method that finds parameter values maximizing the likelihood of observing the given data. It selects population parameters most consistent with the observed sample. Under regularity conditions, MLE estimators are consistent, asymptotically efficient, and asymptotically normally distributed for large samples.
- MeanA measure of central tendency computed by dividing the sum of all values in a dataset by the number of observations. Also called the arithmetic mean, it weights every value equally. The mean is sensitive to outliers, which can pull it away from the bulk of the distribution.
- MedianThe middle value of an ordered dataset that divides the distribution into two equal halves. For an odd number of observations it is the central value; for an even number, the arithmetic mean of the two central values. The median is robust to outliers and skewed distributions.
- Median absolute deviationThe median of the absolute deviations of each observation from the dataset's median, abbreviated MAD. It is a highly robust measure of statistical dispersion, resistant to outliers. For normally distributed data, multiplying MAD by approximately 1.4826 yields a consistent estimate of the standard deviation.
- ModeThe value or values that appear most frequently in a dataset. It is the only measure of central tendency applicable to nominal categorical data. A distribution may have one mode (unimodal), two modes (bimodal), or more (multimodal), and some datasets have no mode at all.
- Moving averageA sequence of averages computed over successive overlapping windows of a time series, smoothing out short-term fluctuations to reveal underlying trends. Common variants include simple, weighted, and exponentially weighted moving averages. Widely used in finance, demand forecasting, and signal processing.
- Multilevel modelA regression model designed to analyze hierarchically nested or clustered data, such as students within schools or patients within hospitals. It simultaneously models variability at individual and group levels, correcting for the standard error bias that arises from ignoring within-group correlation. Also known as the mixed-effects model or hierarchical linear model.
- Multimodal distributionA probability distribution with two or more distinct peaks (modes). Multimodality often indicates that the data arise from a mixture of distinct subpopulations. Bimodal and trimodal distributions are common special cases, typically analyzed using mixture models or cluster analysis.
- Multivariate analysisA collection of statistical techniques for simultaneously analyzing multiple variables. Methods include MANOVA, principal component analysis, factor analysis, cluster analysis, and discriminant analysis. Multivariate analysis uncovers relationships, patterns, and latent structure among variables that univariate methods cannot detect.
- Mutual exclusivityThe property of two or more events that cannot occur simultaneously. If A and B are mutually exclusive, then P(A ∩ B) = 0, and the addition rule simplifies to P(A ∪ B) = P(A) + P(B). Mutual exclusivity does not imply independence; in fact, two mutually exclusive events with positive probability are always dependent.
- Mutual independenceA property of a collection of events whereby every subset satisfies the multiplication rule: the joint probability equals the product of the individual probabilities. Mutual independence implies pairwise independence, but pairwise independence is not sufficient for mutual independence. It is the strongest form of stochastic independence.
N
- Non-sampling errorErrors arising not from the sampling process but from measurement error, data entry mistakes, non-response, coverage gaps, or processing failures. Non-sampling errors can occur in both sample surveys and censuses. Unlike sampling error, they cannot be reduced by increasing sample size and may introduce systematic bias.
- Nonparametric regressionRegression methods that do not assume a predetermined functional form for the relationship between the response and predictors. Examples include kernel regression, local polynomial (LOESS) smoothing, and spline methods. Nonparametric regression is flexible and data-driven but typically requires larger samples than parametric alternatives.
- Nonparametric statisticsStatistical methods that do not require strong assumptions about the functional form of the population distribution. These methods rely on ranks, signs, or permutations rather than distributional parameters. Examples include the Mann-Whitney U test, Kruskal-Wallis test, and Spearman correlation. Preferred for small samples or ordinal data.
- Normal distributionA continuous, symmetric, bell-shaped probability distribution characterized by its mean μ and variance σ². Also called the Gaussian distribution, it is fully described by these two parameters. The central limit theorem establishes its foundational role in statistics; approximately 68%, 95%, and 99.7% of values lie within one, two, and three standard deviations of the mean.
- Normal probability plotA graphical method for assessing whether a dataset follows a normal distribution. Observed values are plotted against the corresponding theoretical normal quantiles; if the data are normally distributed, the points fall approximately along a straight diagonal line. It is a specific case of the quantile-quantile (Q-Q) plot.
- Null hypothesisThe default hypothesis in statistical testing, typically asserting no effect, no difference, or independence between variables; denoted H₀. The test statistic is computed assuming H₀ is true. Rejecting H₀ when the p-value falls below the significance level provides evidence in favor of the alternative hypothesis.
O
- Opinion pollA survey conducted on a sample drawn from a large population to measure public attitudes, preferences, or opinions on a specific issue. The validity of results depends critically on sampling method, sample size, question wording, and non-response rates. Widely used for election forecasting and public policy research.
- Optimal decisionThe choice among available alternatives that yields the best outcome under a specified decision criterion such as maximizing expected utility or minimizing expected loss. In Bayesian decision theory, the optimal decision minimizes posterior expected loss. It is central to statistical decision theory, operations research, and rational choice frameworks.
- Optimal designAn approach to experimental design in which the allocation of experimental runs is chosen to optimize a mathematical criterion based on the Fisher information matrix. Common criteria include D-optimality (minimizing parameter estimate variance), A-optimality, and G-optimality. Useful when classical factorial designs are infeasible.
- OutlierAn observation that deviates markedly from the overall pattern of the data and appears inconsistent with the rest of the sample. Outliers may result from measurement error, data entry mistakes, or genuine extreme values. They can substantially distort means, variances, and regression estimates, so their cause should be investigated before any action is taken.
P
- P-valueThe probability, under the null hypothesis, of obtaining a test statistic at least as extreme as the one observed. A small p-value indicates the data are inconsistent with H₀. It is compared to a pre-specified significance level α; if p ≤ α, H₀ is rejected. The p-value does not equal the probability that H₀ is true.
- Pairwise independenceA property of a collection of events in which every pair of events is mutually independent, so P(Aᵢ ∩ Aⱼ) = P(Aᵢ)P(Aⱼ) for all pairs i ≠ j. Pairwise independence is weaker than mutual independence; a set of pairwise-independent events can still exhibit higher-order dependence among triples or larger subsets.
- ParameterA fixed numerical quantity that describes a characteristic or distribution of an entire population, such as the population mean μ, variance σ², or proportion p. Parameters are typically unknown and estimated from sample statistics. Unlike a statistic, a parameter does not vary from sample to sample; it is a fixed property of the population.
- Particle filterAlso known as sequential Monte Carlo, a particle filter performs Bayesian recursive filtering for nonlinear, non-Gaussian state-space models. It represents the posterior distribution of the hidden state as a weighted set of random samples (particles), updating weights via importance sampling at each time step. Widely used in robotics, tracking, and financial modeling.
- PercentileThe value below which a given percentage of observations in a dataset falls. The pth percentile is the value Pₚ such that p% of the data lie at or below it. The median corresponds to the 50th percentile. Percentiles are widely used to describe distributions, set reference ranges in clinical medicine, and compare groups.
- Pie chartA circular chart divided into slices where each slice's area is proportional to the relative frequency or percentage of a category. Pie charts effectively display part-to-whole relationships for a small number of categories. They are less effective than bar charts when categories are numerous or when differences between slices are small.
- Point estimationThe use of sample data to compute a single numerical value as an estimate of an unknown population parameter. Common point estimators include the sample mean, sample proportion, and maximum likelihood estimator. Unlike interval estimation, a point estimate conveys no information about the precision or uncertainty of the estimate.
- Population parameterA fixed numerical quantity that describes a characteristic of an entire population, such as the population mean (μ) or variance (σ²). Because the full population is rarely accessible, parameters are typically estimated from sample statistics. They are conventionally denoted by Greek letters.
- Posterior probabilityIn Bayesian inference, the updated probability distribution of a parameter or hypothesis after observing data. Derived via Bayes' theorem as the product of the prior and the likelihood, normalized: P(θ|data) ∝ P(data|θ) × P(θ). It forms the primary output of Bayesian analysis.
- Principal component analysisA dimensionality reduction technique that transforms correlated variables into a smaller set of uncorrelated linear combinations called principal components, ordered by the variance they explain. Based on eigenvalue decomposition of the covariance or correlation matrix. Widely used for data compression, visualization, and multicollinearity reduction.
- Prior probabilityIn Bayesian statistics, the probability assigned to a parameter or hypothesis before observing new data. It encodes prior knowledge, expert judgment, or theoretical assumptions. Combined with the likelihood of observed data via Bayes' theorem, it yields the posterior probability.
- ProbabilityA numerical measure between 0 and 1 quantifying the likelihood of an event occurring. Under the frequentist interpretation it represents long-run relative frequency; under the Bayesian interpretation, degree of belief. Formally defined by Kolmogorov's axioms: non-negativity, normalization, and countable additivity.
- Probability density functionA function f(x) describing the relative likelihood of a continuous random variable taking a given value, satisfying f(x) ≥ 0 and integrating to 1 over its domain. The probability of the variable falling in an interval is the integral of f(x) over that interval; probability at any single point is zero.
- Probability distributionA mathematical function that assigns probabilities to the possible values of a random variable. Discrete variables are described by a probability mass function; continuous variables by a probability density function. Common examples include the normal, binomial, Poisson, and exponential distributions.
- Probability measureA function defined on a σ-algebra that satisfies Kolmogorov's axioms: non-negative values for all events, total measure of the sample space equals 1, and countable additivity for disjoint events. It is the formal mathematical foundation of probability theory and the third element of a probability space.
- Probability spaceThe fundamental mathematical framework of probability theory, defined as the triple (Ω, F, P), where Ω is the sample space, F is a σ-algebra of events, and P is a probability measure on F. Formalized by Kolmogorov, it provides the rigorous axiomatic foundation for all of modern probability theory.
Q
- QuantileValues that divide a probability distribution or dataset into intervals with equal probabilities. The p-th quantile is the value below which a proportion p of observations fall. The median is the 0.5 quantile. Percentiles, quartiles, and deciles are special cases of quantiles.
- QuartileThe three values that divide a sorted dataset into four equal parts: first quartile (Q1, 25th percentile), second quartile (Q2, median, 50th), and third quartile (Q3, 75th). The interquartile range (IQR = Q3 − Q1) measures the spread of the middle half and is used to identify outliers.
- Quota samplingA non-probability sampling method in which the researcher divides the population into subgroups based on specified characteristics and fills predetermined quotas from each subgroup using convenient selection. Improves representativeness over pure convenience sampling but lacks random selection, limiting statistical generalizability.
R
- Random variableA measurable function that maps outcomes in a sample space to real numbers. Classified as discrete if it takes countable values, or continuous if it takes values over a continuous interval. Its probability distribution fully characterizes the probabilities associated with all possible values.
- Randomized block designAn experimental design in which experimental units are grouped into homogeneous blocks and treatments are randomly assigned within each block. Blocking controls for a known source of variability, reducing error variance and increasing precision of treatment effect estimates compared to a completely randomized design.
- RangeThe difference between the maximum and minimum values in a dataset: R = X_max − X_min. It is the simplest measure of dispersion but highly sensitive to outliers since it depends only on the two extreme observations and ignores the distribution of all values in between.
- Recursive Bayesian estimationA sequential Bayesian framework for updating the probability distribution of a system state as new observations arrive. The previous posterior becomes the new prior at each step, updated with the likelihood of incoming data. The Kalman filter and particle filter are prominent special-case implementations.
- Regression analysisA statistical method for modeling the relationship between a dependent variable and one or more independent variables. Used for prediction, quantifying association strength, and causal inference. Common forms include simple and multiple linear regression, logistic regression, and polynomial regression.
- Repeated measures designAn experimental design in which the same subjects are measured under multiple conditions or time points. Each subject serves as its own control, removing between-subject variability and yielding higher statistical power with fewer participants. Requires testing the assumption of sphericity (e.g., Mauchly's test).
- Response variableThe outcome variable measured in a study that is to be explained or predicted; also called the dependent variable. In regression and experimental designs, it is the variable on which the effect of predictors or treatments is observed. Its type (continuous, binary, count) determines the appropriate statistical model.
- Restricted randomizationRandomization conducted under specified constraints rather than complete free randomization. Includes block randomization, stratified randomization, and minimization procedures. Used to ensure balance across treatment groups and improve practical feasibility while preserving the validity of causal inference.
- Robust statisticsA branch of statistics providing methods and estimators that remain reliable in the presence of outliers, departures from distributional assumptions, or model misspecification. Key examples include the median, trimmed mean, and M-estimators. Breakdown point and influence function are standard measures of robustness.
- Round-off errorThe error introduced when a real number is approximated by a finite-precision representation, arising from the limitations of floating-point arithmetic in digital computation. Accumulated round-off errors in numerical calculations can degrade result accuracy, particularly in iterative algorithms or operations involving large datasets.
S
- SampleA subset of a population selected to estimate population parameters and make statistical inferences. A valid sample must be representative of the population to support generalizable conclusions. Sample size and selection method directly affect the precision and unbiasedness of the resulting estimates.
- Sample covarianceA statistic estimating the linear co-variability of two variables from sample data: s_xy = Σ(x_i − x̄)(y_i − ȳ) / (n−1). Division by (n−1) is the Bessel correction, yielding an unbiased estimator of the population covariance. Its sign indicates direction and its magnitude reflects the strength of the linear association.
- Sample meanThe arithmetic average of all observations in a sample: x̄ = Σx_i / n. It is an unbiased and consistent estimator of the population mean μ. The sampling distribution of the sample mean is central to the central limit theorem and underpins many inferential procedures.
- Sample spaceThe universal set of all possible outcomes of a random experiment, conventionally denoted Ω. For example, a single coin toss gives Ω = {heads, tails}; two dice give a 36-element set. It is the first element of a probability space, and all events are subsets of the sample space.
- SamplingThe process of selecting a subset from a larger population to make statistical inferences about it. Probability-based methods (simple random, stratified, systematic, cluster) support unbiased estimation, while non-probability methods (convenience, quota, snowball) are used under practical constraints but limit generalizability.
- Sampling biasA systematic error arising when the sampling method over- or under-represents certain members of the population relative to their true prevalence. Common forms include voluntary response bias, survivorship bias, and selection bias. Unlike random error, it cannot be reduced by increasing sample size — the method itself must be corrected.
- Sampling distributionThe probability distribution of a statistic (e.g., sample mean) obtained by repeatedly drawing samples of the same size from a population. It underlies the central limit theorem; as sample size increases, the sampling distribution of the mean approaches normality regardless of the population's shape.
- Sampling errorThe discrepancy between a statistic computed from a sample and the true population parameter, arising purely from chance variation in which units are selected. It decreases as sample size increases and is distinct from systematic bias, which stems from flawed sampling procedures rather than random variation.
- Scale parameterA parameter that controls the spread or dispersion of a probability distribution without changing its shape. Multiplying the random variable by a positive constant scales this parameter proportionally. The standard deviation in a normal distribution and the inverse rate in an exponential distribution are canonical examples.
- Scatter plotA graph that displays the relationship between two continuous variables by plotting each observation as a point in a Cartesian plane, with one variable on the horizontal axis and the other on the vertical. It reveals the direction, strength, and linearity of association and highlights outliers or clusters.
- Significance levelThe pre-specified probability threshold (α, commonly 0.05 or 0.01) for rejecting the null hypothesis when it is true, thereby controlling the Type I error rate. If the p-value of the test statistic falls below α, the null hypothesis is rejected. It must be set before data collection to be valid.
- Simple random sampleA sampling method in which every unit in the population and every possible group of n units has an equal probability of being selected. Selections are made independently, ensuring that estimators derived from the sample are unbiased. It is the foundational probability sampling design upon which more complex designs are built.
- Simpson's paradoxA statistical phenomenon in which a trend that appears consistently within several sub-groups reverses or disappears when those groups are combined. It arises from a confounding variable that is unevenly distributed across groups. Formalized by E. H. Simpson in 1951, it underscores the necessity of controlling for lurking variables in causal reasoning.
- SkewnessA measure of the asymmetry of a probability distribution about its mean, defined as the standardized third central moment: E[(X−μ)³]/σ³. Positive skewness indicates a longer right tail; negative skewness indicates a longer left tail. A symmetric distribution has skewness of zero. It affects the relationship between mean, median, and mode.
- Spectrum biasThe phenomenon in which the diagnostic accuracy measures of a test — sensitivity, specificity, and predictive values — vary depending on the spectrum of disease severity in the population being tested. Tests developed on clearly defined cases may overestimate performance when applied to general populations containing mild or ambiguous presentations.
- Standard deviationThe square root of the variance, expressing the average spread of data values around the mean in the original units of measurement. Denoted σ for a population and s for a sample; the sample version uses n−1 in the denominator (Bessel's correction) to yield an unbiased variance estimate. It is the most widely used measure of dispersion.
- Standard errorThe standard deviation of the sampling distribution of a statistic. For the sample mean it equals σ/√n, decreasing as sample size grows. It quantifies the precision of an estimator and is the fundamental building block for constructing confidence intervals and conducting hypothesis tests about population parameters.
- Standard scoreA dimensionless value indicating how many standard deviations an observation lies from the population mean, computed as z = (x − μ)/σ. It standardizes variables to a common scale, enabling comparison across different units of measurement, and maps directly to probabilities via the standard normal distribution table.
- StatisticAny numerical quantity computed from sample data, used to estimate or make inferences about a population parameter. Common examples include the sample mean, sample median, and sample standard deviation. Unlike a fixed population parameter, a statistic is a random variable whose value changes from sample to sample.
- Statistical dispersionThe extent to which a distribution is spread out or squeezed together, describing variability in a dataset. Common measures include variance, standard deviation, range, interquartile range, and mean absolute deviation. Dispersion measures complement central tendency statistics to provide a complete characterization of a distribution's shape.
- Statistical graphicsThe discipline of exploring, summarizing, and communicating data through visual representation. It encompasses tools such as histograms, box plots, scatter plots, and Q-Q plots. Closely associated with John Tukey's exploratory data analysis framework, statistical graphics enable rapid perception of patterns, outliers, distributions, and relationships in data.
- Statistical hypothesis testingA formal procedure for deciding whether sample evidence warrants rejection of a null hypothesis (H₀) about a population. A test statistic is computed, its p-value compared to a pre-set significance level α. The Neyman-Pearson framework formalizes the trade-off between Type I error (false rejection) and Type II error (false retention).
- Statistical independenceThe condition in which knowing the outcome of one event or random variable provides no information about another. For events, formally defined as P(A∩B) = P(A)P(B); for random variables, the joint distribution equals the product of marginals. Statistical independence is a core assumption underlying many inferential procedures including t-tests and linear regression.
- Statistical inferenceThe process of drawing conclusions about population parameters from sample data, quantifying uncertainty in probabilistic terms. It encompasses point estimation, interval estimation (confidence or credible intervals), and hypothesis testing. Frequentist and Bayesian frameworks offer distinct philosophical foundations for how probability is interpreted and uncertainty is expressed.
- Statistical modelA mathematical representation of a data-generating process, formally defined as a family of probability distributions indexed by parameters. It encodes assumptions about the distributions of observed variables and their relationships. Examples include linear regression, generalized linear models, and Bayesian networks. Model selection and validation are central statistical activities.
- Statistical populationThe complete set of all units or observations that are of interest in a statistical investigation, from which a sample is drawn. A population may be finite and concrete (all registered voters) or infinite and theoretical (all possible measurements of a process). Statistical inference uses the sample to generalize conclusions back to the population.
- Statistical powerThe probability that a statistical test correctly rejects a false null hypothesis, expressed as 1 − β where β is the Type II error probability. Power increases with larger sample size, larger effect size, higher significance level, and lower variance. It is the central quantity in sample size determination to ensure a study is adequately sensitive to detect a true effect.
- Statistical significanceThe determination that a test result is unlikely to have occurred by chance alone, assessed by comparing the p-value to the pre-set significance level α. A result is statistically significant when p < α. Statistical significance does not imply practical or clinical importance; effect size and context must also be considered for meaningful interpretation.
- Stem-and-leaf displayAn exploratory data display that splits each data value into a stem (leading digit(s)) and a leaf (trailing digit), preserving the raw values while simultaneously revealing the distribution's shape, center, and spread. Popularized by John Tukey, it functions as a text-based histogram and is most practical for small to medium datasets.
- Stratified samplingA probability sampling design in which the population is divided into mutually exclusive, exhaustive strata, and independent samples are drawn from each stratum. It guarantees representation of key subgroups and can improve estimation efficiency over simple random sampling, particularly when strata are internally homogeneous but differ from each other.
- Student's t-testA parametric test for comparing means when the population variance is unknown, relying on the t-distribution. Developed by William Gosset under the pseudonym 'Student' in 1908. Variants include one-sample, independent two-sample, and paired tests. Valid for small samples under normality; robust to mild violations with larger samples via the central limit theorem.
- Survey methodologyThe scientific discipline concerned with designing, conducting, and evaluating surveys that collect data from human populations. It addresses sampling design, questionnaire construction, response and nonresponse biases, measurement error, and weighting procedures. Survey methodology is central to official statistics, social science research, and market research.
- Survival functionThe function giving the probability that a subject or system survives beyond time t, defined as S(t) = P(T > t) = 1 − F(t), the complement of the cumulative distribution function. It is monotonically non-increasing from S(0) = 1. Empirically estimated by the Kaplan-Meier estimator, which handles right-censored observations common in clinical trials.
- Survivorship biasA logical error and selection bias that occurs when analysis focuses only on units that passed a selection process, ignoring those that did not survive or were eliminated. Studying only existing successful firms or returning aircraft leads to systematically misleading conclusions. Abraham Wald's WWII aircraft armor analysis is the canonical historical example.
- Symmetric probability distributionA probability distribution whose density or mass function is symmetric about a central point c, satisfying f(c − x) = f(c + x) for all x. Such distributions have zero skewness, and their mean, median, and mode coincide at c. Examples include the normal, Student's t, uniform, and Laplace distributions.
- Systematic samplingA sampling method in which every k-th unit is selected from an ordered list after a random start, where the sampling interval k = N/n. It is simple to implement and can be more efficient than simple random sampling for randomly ordered lists, but risks periodicity bias if the list has a cyclic structure aligned with the sampling interval.
T
- Test statisticA numerical value computed from sample data used to evaluate a null hypothesis in a statistical test. It measures how far the observed data deviate from what is expected under the null hypothesis. Common examples include the t-statistic, F-statistic, z-score, and chi-square statistic; the computed value is compared to a critical value or used to derive a p-value.
- Tidy dataA data organization principle formulated by Hadley Wickham in which each variable forms a column, each observation forms a row, and each type of observational unit forms a table. This consistent structure simplifies data cleaning, transformation, and visualization, and underpins the tidyverse ecosystem in R.
- Time domainThe analytical framework in which a signal or time series is represented directly as values indexed by time, as opposed to the frequency domain. Methods such as autocorrelation, moving averages, and ARMA models are defined in the time domain. Patterns including level, trend, and cyclical behavior are examined along the time axis.
- Time seriesA sequence of observations collected sequentially at specific time points or intervals. The temporal dependence between successive observations distinguishes time series from independent samples. Widely used in economics, meteorology, finance, and engineering, time series can be decomposed into trend, seasonality, cyclical, and irregular components.
- Time series analysisA collection of statistical methods aimed at uncovering the structure of data measured sequentially over time. Core tools include stationarity testing, autocorrelation functions, ARIMA modeling, spectral analysis, and seasonal decomposition. The goals are to understand the dependence structure of the series and to build models suitable for forecasting future values.
- Time series forecastingThe process of predicting future values of a time series using past observations and statistical or machine-learning models. Methods include ARIMA, exponential smoothing, VAR, and neural network–based approaches. Forecast accuracy is evaluated using metrics such as MAE, RMSE, and MAPE; the forecast horizon and uncertainty intervals are critical components of any forecast report.
- Trimmed estimatorA statistical estimator computed after removing a specified percentage of the smallest and largest values from an ordered dataset. The trimmed mean is the most common example. Trimmed estimators are robust to outliers and provide reliable location estimates in heavily skewed or contaminated distributions where the full-sample mean would be misleading.
- Type I and type II errorsTwo types of decision errors in hypothesis testing. A Type I error (α) occurs when a true null hypothesis is rejected, also called a false positive. A Type II error (β) occurs when a false null hypothesis is not rejected, also called a false negative. There is a trade-off between the two: reducing one typically increases the other, given a fixed sample size.
V
W
- Weighted arithmetic meanAn average in which each observation is multiplied by a weight reflecting its relative importance or frequency, then divided by the sum of weights. Used in survey design with sampling weights, index construction such as price indices, and meta-analysis, where observations contribute unequally to the overall estimate and the standard arithmetic mean would be misleading.
- Weighted medianA robust central tendency measure for weighted data, defined as the value at which the cumulative sum of weights first reaches or exceeds 0.5. It extends the ordinary median to settings where observations carry unequal importance, such as weighted survey data for median income estimation and in robust regression methods, reducing sensitivity to extreme values.