How is self-supervised learning different from unsupervised learning?

Self-supervised learning is a form of unsupervised learning in which the model is trained with a supervised-style objective whose targets are generated automatically from the data, for example by hiding part of the input and predicting it. It uses no human labels but still frames learning as prediction.

Why is a good representation so valuable?

Once data are encoded into a representation that captures their essential structure, even simple models can perform well, and the same representation can serve many tasks. Learning such transferable features from unlabeled data is what makes pretraining so effective.

Self-Supervised and Representation Learning

Self-supervised and representation learning create useful features from unlabeled data by inventing prediction tasks from the data itself, producing representations that transfer to many downstream problems.

Find emne med PaperMindSnartFind papers & topics

Tools & resources

Hent slides

Learn & explore

VideoSnart

Definition

Self-supervised learning trains a model on tasks whose labels are derived automatically from the input, such as predicting a hidden part of the data or recognizing two augmented views as the same item, so that the model learns general-purpose representations usable for later supervised tasks.

Scope

This topic covers learning representations without human labels: autoencoders that compress and reconstruct inputs, contrastive methods that pull together related views and push apart unrelated ones, and pretext or masked-prediction tasks that turn unlabeled data into supervised signals. It addresses why good representations matter and how pretrained features transfer across tasks.

Core questions

How can supervised-style training signals be generated from unlabeled data?
What makes a learned representation useful and transferable?
How do contrastive and reconstructive objectives differ?
Why does pretraining on large unlabeled corpora help downstream tasks?

Key theories

Representation learning: The quality of a learned representation, rather than the choice of classifier, often determines performance, so learning features that disentangle the underlying factors of variation is a central goal.
Autoencoding and reconstruction: Autoencoders learn compact codes by reconstructing their inputs through a bottleneck, and variants such as denoising autoencoders learn robust features by reconstructing corrupted inputs.
Pretraining and transfer: Models pretrained on large unlabeled datasets with self-supervised objectives learn broadly useful features that transfer to many downstream tasks with little labeled data, a paradigm central to modern systems.

Clinical relevance

Self-supervised pretraining is the foundation of modern language and vision systems, allowing models to absorb knowledge from vast unlabeled corpora before being adapted to specific tasks with limited labels; it dramatically reduces the labeled data needed for strong performance and is a major reason for recent advances in artificial intelligence.

History

Representation learning grew from autoencoders and unsupervised pretraining of deep networks in the 2000s. Self-supervised objectives, including masked prediction in language and contrastive learning in vision, later proved capable of learning powerful general-purpose representations, becoming the dominant approach to pretraining large models.

Key figures

Yoshua Bengio
Geoffrey Hinton
Yann LeCun

Seminal works

bengio2013
goodfellow2016
lecun2015

Frequently asked questions

How is self-supervised learning different from unsupervised learning?: Self-supervised learning is a form of unsupervised learning in which the model is trained with a supervised-style objective whose targets are generated automatically from the data, for example by hiding part of the input and predicting it. It uses no human labels but still frames learning as prediction.
Why is a good representation so valuable?: Once data are encoded into a representation that captures their essential structure, even simple models can perform well, and the same representation can serve many tasks. Learning such transferable features from unlabeled data is what makes pretraining so effective.