What is a support vector?

A support vector is a training point that lies on or within the margin and thus determines the position of the decision boundary. The fitted classifier depends only on these points, so the rest of the training data can be discarded.

Why is it called a trick to use kernels?

The kernel trick lets an algorithm behave as if it had mapped the data into a very high-dimensional or even infinite-dimensional feature space, while only ever computing kernel values between pairs of points. It avoids the cost of constructing that space explicitly.

Support Vector Machines and Kernel Methods

Support vector machines find the decision boundary that maximizes the margin between classes, and the kernel trick lets such linear methods operate implicitly in rich nonlinear feature spaces.

Găsește o temă cu PaperMindÎn curândFind papers & topics

Tools & resources

Descarcă prezentarea

Learn & explore

VideoÎn curând

Definition

A support vector machine is a classifier that chooses the separating hyperplane maximizing the distance to the nearest training points; kernel methods generalize this by computing inner products through a kernel function, allowing linear algorithms to fit nonlinear boundaries without explicitly constructing the high-dimensional feature space.

Scope

This topic covers maximum-margin classification, the primal and dual formulations of the support vector machine, the role of support vectors and slack variables for non-separable data, the kernel trick that replaces inner products with kernel functions, common kernels such as polynomial and radial basis functions, and the extension of kernelization to regression and other linear methods.

Core questions

Why does maximizing the margin tend to improve generalization?
How does the dual formulation express the solution in terms of support vectors?
What does the kernel trick accomplish and why is it efficient?
How are soft margins and slack variables used when classes overlap?

Key theories

Maximum-margin separation: Among separating hyperplanes, the one maximizing the margin to the closest points yields a unique solution determined by a few support vectors and is associated with good generalization bounds.
The kernel trick: Because the optimization depends on data only through inner products, replacing them with a kernel function evaluates a nonlinear feature map implicitly, fitting nonlinear boundaries at the cost of a linear method.
Soft-margin and slack variables: Allowing controlled margin violations through slack variables and a regularization parameter makes the support vector machine applicable to overlapping, noisy classes while trading off margin width against training errors.

Clinical relevance

Support vector machines and kernel methods were the leading high-accuracy classifiers before deep learning and remain strong choices for moderate-sized problems, especially in text and bioinformatics; the kernel idea also generalizes far beyond classification, appearing in kernel regression, Gaussian processes, and kernelized principal component analysis.

History

The maximum-margin idea and the kernel trick were combined by Boser, Guyon, and Vapnik around 1992, and the soft-margin support vector machine was formalized by Cortes and Vapnik in 1995. Through the late 1990s and 2000s kernel methods became dominant in pattern recognition before being largely displaced by deep learning on large-scale perceptual tasks.

Key figures

Vladimir Vapnik
Corinna Cortes
Bernhard Scholkopf

Seminal works

cortes1995
vapnik1995
bishop2006

Frequently asked questions

What is a support vector?: A support vector is a training point that lies on or within the margin and thus determines the position of the decision boundary. The fitted classifier depends only on these points, so the rest of the training data can be discarded.
Why is it called a trick to use kernels?: The kernel trick lets an algorithm behave as if it had mapped the data into a very high-dimensional or even infinite-dimensional feature space, while only ever computing kernel values between pairs of points. It avoids the cost of constructing that space explicitly.