Convolutional and Sequence Models
Convolutional networks exploit spatial structure in grid-like data such as images, while recurrent and attention-based models process sequences such as text and speech.
Definition
Convolutional models apply learned filters across a grid so that the same feature detector is reused at every location, while sequence models process ordered inputs by maintaining state over time or by attending across positions, each architecture encoding prior assumptions suited to its data type.
Scope
This topic covers architectures specialized to structured data: convolutional neural networks with local filters, weight sharing, and pooling for images and other grids; recurrent networks and long short-term memory units for sequences with long-range dependencies; and attention mechanisms that model relationships across positions. It addresses the inductive biases that make these architectures effective.
Core questions
- How does convolution exploit translation structure in images?
- Why do weight sharing and pooling help generalization and efficiency?
- How do recurrent and long short-term memory units handle long sequences?
- What does attention add over purely recurrent processing?
Key theories
- Convolution and weight sharing
- Convolutional layers apply the same small filter across all positions, dramatically reducing parameters and building in translation equivariance so features learned in one location transfer everywhere.
- Long short-term memory
- Gated recurrent units such as long short-term memory maintain a protected memory cell, letting recurrent networks learn dependencies across many time steps that plain recurrence cannot.
- Attention over sequences
- Attention mechanisms let a model weigh and combine information from all positions of a sequence directly, capturing long-range relationships and enabling highly parallel sequence processing.
Clinical relevance
Convolutional networks revolutionized computer vision and medical imaging, while sequence models powered speech recognition and machine translation and, through attention, the large language models behind modern natural language systems; matching architecture to data structure remains a central design principle in applied deep learning.
History
Convolutional networks grew from Fukushima's neocognitron and LeCun's work on digit recognition, and their 2012 success on large-scale image classification ignited the deep-learning boom. Long short-term memory, introduced in 1997, solved the long-range dependency problem for sequences, and attention mechanisms later became the foundation of transformer models.
Key figures
- Yann LeCun
- Sepp Hochreiter
- Juergen Schmidhuber
- Kunihiko Fukushima
Related topics
Seminal works
- hochreiter1997
- lecun2015
- goodfellow2016
Frequently asked questions
- Why are convolutional networks so good at images?
- Images have local structure and patterns that can appear anywhere. Convolution applies the same filter across the whole image, so a feature like an edge is detected wherever it occurs, using far fewer parameters than a fully connected layer and generalizing better.
- What problem does long short-term memory solve?
- Plain recurrent networks struggle to learn dependencies that span many time steps because gradients vanish. Long short-term memory introduces a gated memory cell that preserves information over long intervals, making it possible to learn long-range temporal patterns.