Is computer vision the same as image processing?

Image processing mainly transforms images into other images or low-level descriptions, while computer vision aims to interpret images to recover scene information such as 3D structure, motion, and object identity; vision builds on image processing.

Why is vision hard for computers despite being easy for people?

An image is an ambiguous projection of a 3D world: many scenes can produce the same image, and lighting, viewpoint, occlusion, and clutter vary enormously, so recovering the underlying scene requires strong models or large amounts of learned prior knowledge.

Computer Vision

Computer vision is the field concerned with enabling machines to interpret images and video, recovering the geometry, motion, and content of the scenes that produced them.

Onderwerp vinden met PaperMindBinnenkortFind papers & topics

Tools & resources

Dia's downloaden

Learn & explore

VideoBinnenkort

Definition

Computer vision is the study of methods that take images or video as input and produce descriptions of scene structure, motion, or semantic content as output.

Scope

This area covers the geometry of image formation and camera calibration, the recovery of three-dimensional structure and camera pose from multiple views, the estimation of motion and optical flow over time, and the recognition, detection, and localization of objects and scenes, increasingly via learned models.

Sub-topics

Core questions

How does the geometry of a camera relate 3D scenes to 2D images?
How can 3D structure and camera motion be recovered from images?
How is motion in a scene estimated from a video sequence?
How are objects and categories recognized and localized in images?

Key concepts

Camera projection
Multi-view geometry
3D reconstruction
Optical flow
Object recognition and detection
Learned visual representations

Key theories

Projective geometry of image formation: Cameras are modeled as projective devices mapping 3D points to image points, and the relations between multiple views are captured by entities such as the fundamental and essential matrices, providing the geometric backbone of reconstruction.
Vision as inference of scene structure: Marr framed vision as a computational process recovering increasingly explicit scene descriptions from images, a layered theory that shaped how the field decomposes the problem from early features to objects.

Clinical relevance

Computer vision powers autonomous vehicles and robotics, face and biometric recognition, medical image diagnosis, industrial inspection, augmented reality, and image search, and it is one of the most active application areas of deep learning.

History

Computer vision began in the 1960s and 1970s with line drawings and shape from shading; Marr's computational theory shaped the 1980s, geometric multi-view methods matured in the 1990s and 2000s, and deep convolutional networks transformed recognition from the 2010s.

Debates

Geometry-driven versus learning-driven vision: Classical vision emphasized explicit physical and geometric models of image formation, while modern deep learning favors data-driven representations; the field increasingly combines the two, embedding geometric structure into learned systems.

Key figures

David Marr
Richard Hartley
Andrew Zisserman

Seminal works

hartley2004
marr1982
szeliski2022

Frequently asked questions

Is computer vision the same as image processing?: Image processing mainly transforms images into other images or low-level descriptions, while computer vision aims to interpret images to recover scene information such as 3D structure, motion, and object identity; vision builds on image processing.
Why is vision hard for computers despite being easy for people?: An image is an ambiguous projection of a 3D world: many scenes can produce the same image, and lighting, viewpoint, occlusion, and clutter vary enormously, so recovering the underlying scene requires strong models or large amounts of learned prior knowledge.