What is an activation function and why is it needed?

An activation function applies a nonlinear transformation to a neuron's weighted input sum. Without it, stacking layers would only produce another linear function, so the nonlinearity is what lets deep networks represent complex, nonlinear relationships.

If one wide layer can approximate any function, why go deep?

Universal approximation says a shallow network can in principle fit any function, but it may need impractically many neurons. Deep networks often represent the same functions far more compactly and learn useful hierarchical features, which is why depth is preferred in practice.

Neural Network Architectures

Neural network architectures specify how artificial neurons are connected into layers, defining the family of functions a network can represent.

Definition

A neural network architecture is the arrangement of artificial neurons into connected layers, where each neuron computes a nonlinear function of a weighted sum of its inputs; the architecture determines the network's capacity and the inductive biases it brings to a learning problem.

Scope

This topic covers the building blocks and structures of neural networks: the artificial neuron with weighted inputs and a nonlinear activation, fully connected feedforward layers and the multilayer perceptron, activation functions such as sigmoid and rectified linear units, and how depth, width, and connectivity shape what a network can learn. It introduces the universal approximation property and the role of architecture choice.

Core questions

How does an artificial neuron compute its output?
What can a multilayer network represent that a single layer cannot?
How do activation functions affect learning?
How do depth and width trade off capacity against trainability?

Key theories

Universal approximation: A feedforward network with a single sufficiently wide hidden layer can approximate any continuous function on a bounded domain, establishing neural networks as flexible function approximators.
Activation functions and nonlinearity: Nonlinear activations are what give multilayer networks their power; rectified linear units in particular ease gradient flow and have become the default choice for deep networks.
Depth as composition: Adding layers composes transformations so the network builds increasingly abstract features, often representing complex functions more efficiently than a single wide layer.

Clinical relevance

The choice of architecture is the main way prior knowledge about a problem is built into a deep model, from fully connected networks for generic data to specialized structures for images and sequences; understanding the artificial neuron and the universal approximation property clarifies both the power and the limits of neural networks.

History

The artificial neuron traces to McCulloch and Pitts and to Rosenblatt's perceptron. Minsky and Papert's critique of single-layer networks slowed the field until multilayer networks and backpropagation revived it, and the deep-learning era brought architectures of dozens or hundreds of layers built from rectified-linear units and other components.

Key figures

Frank Rosenblatt
Geoffrey Hinton
Yann LeCun

Seminal works

goodfellow2016
bishop2006
lecun2015

Frequently asked questions

What is an activation function and why is it needed?: An activation function applies a nonlinear transformation to a neuron's weighted input sum. Without it, stacking layers would only produce another linear function, so the nonlinearity is what lets deep networks represent complex, nonlinear relationships.
If one wide layer can approximate any function, why go deep?: Universal approximation says a shallow network can in principle fit any function, but it may need impractically many neurons. Deep networks often represent the same functions far more compactly and learn useful hierarchical features, which is why depth is preferred in practice.