What does shattering mean?

A set of points is shattered by a model class if, for every possible assignment of labels to those points, some model in the class produces exactly that labeling. The size of the largest shatterable set is the Vapnik-Chervonenkis dimension.

Is a model with more parameters always higher capacity?

Not necessarily. Capacity is measured by the Vapnik-Chervonenkis dimension or related quantities, which can differ from the parameter count. The right measure of complexity for generalization is capacity, not simply how many parameters a model has.

VC Dimension and Capacity

The Vapnik-Chervonenkis dimension measures the capacity of a model class by the largest set of points it can label in all possible ways, quantifying how complex a learner is.

Găsește o temă cu PaperMindÎn curândFind papers & topics

Tools & resources

Descarcă prezentarea

Learn & explore

VideoÎn curând

Definition

The Vapnik-Chervonenkis dimension of a class of classifiers is the largest number of points that the class can label in every possible way; it is a measure of capacity that bounds how much the class can overfit and therefore how much data is needed to learn reliably.

Scope

This topic covers measures of the richness of a hypothesis class: the notion of shattering a set of points, the Vapnik-Chervonenkis dimension as the size of the largest shattered set, the growth function, and how these capacity measures enter generalization bounds. It explains why capacity, rather than the number of parameters alone, governs the ability to generalize.

Core questions

What does it mean for a model class to shatter a set of points?
How is the Vapnik-Chervonenkis dimension defined and computed?
Why does capacity rather than parameter count govern generalization?
How does capacity enter bounds on the gap between training and true error?

Key theories

Shattering and capacity: A class shatters a set of points if it can realize every possible labeling of them; the largest such set defines the Vapnik-Chervonenkis dimension, a distribution-free measure of how flexible the class is.
Capacity controls uniform convergence: Finite capacity ensures that empirical error converges to true error uniformly over the class, so a learner with bounded Vapnik-Chervonenkis dimension cannot overfit arbitrarily as data grow.
Capacity versus parameter count: Capacity, not the raw number of parameters, determines generalization, so two models with the same parameter count can differ greatly in how much data they require.

Clinical relevance

The Vapnik-Chervonenkis dimension provides the central capacity measure of classical learning theory and justifies the practice of controlling model complexity; it underlies the margin-based analysis of support vector machines and frames ongoing efforts to understand why some very high-capacity models nonetheless generalize.

History

Vapnik and Chervonenkis introduced the dimension that bears their names in work from the late 1960s and the 1971 paper on uniform convergence, establishing a distribution-free theory of capacity. The concept became foundational for support vector machines and for the broader analysis of generalization.

Key figures

Vladimir Vapnik
Alexey Chervonenkis

Seminal works

vapnik1971
vapnik1995
hastie2009

Frequently asked questions

What does shattering mean?: A set of points is shattered by a model class if, for every possible assignment of labels to those points, some model in the class produces exactly that labeling. The size of the largest shatterable set is the Vapnik-Chervonenkis dimension.
Is a model with more parameters always higher capacity?: Not necessarily. Capacity is measured by the Vapnik-Chervonenkis dimension or related quantities, which can differ from the parameter count. The right measure of complexity for generalization is capacity, not simply how many parameters a model has.