Why are Bayesian networks more compact than a full joint distribution?

A full joint distribution over n binary variables needs about 2^n numbers. A Bayesian network only stores, for each variable, its probability given its parents, so when each variable has few parents the total number of parameters grows roughly linearly rather than exponentially in the number of variables.

What does d-separation tell you?

d-separation is a graphical test that determines, from the network structure alone, whether two sets of variables are conditionally independent given a third set of observed variables. It lets you read independence relationships off the graph without examining the actual probability numbers.

Bayesian Networks

A Bayesian network is a directed acyclic graph whose nodes are random variables and whose edges encode conditional dependencies, providing a compact representation of a joint probability distribution.

ค้นหาหัวข้อด้วย PaperMindเร็ว ๆ นี้Find papers & topics

Tools & resources

ดาวน์โหลดสไลด์

Learn & explore

วิดีโอเร็ว ๆ นี้

Definition

A Bayesian network is a probabilistic graphical model consisting of a directed acyclic graph over random variables together with a conditional probability distribution for each variable given its parents, which jointly define a full distribution over all the variables.

Scope

This topic covers the structure and semantics of Bayesian (belief) networks: the directed acyclic graph, local conditional probability distributions, the chain-rule factorization of the joint distribution, and the independence relations they encode (the Markov condition and d-separation). It addresses how a network is read as a model of conditional independence and how it compactly stores an exponentially large distribution. Inference algorithms over these networks are treated in the related probabilistic-inference topic, and learning their structure or parameters from data belongs to the machine-learning subfield.

Core questions

How does a directed acyclic graph plus local conditional distributions specify a full joint distribution?
What conditional independence relations does the network's structure encode?
How does d-separation determine whether two variables are independent given observed evidence?
Why does the factored representation require far fewer numbers than the full joint distribution?

Key concepts

directed acyclic graph
conditional probability tables
chain-rule factorization
Markov condition
d-separation
parents and descendants
compact joint distribution
graphical model

Key theories

Factorization via the Markov condition: A Bayesian network asserts that each variable is conditionally independent of its non-descendants given its parents, so the joint distribution factors into the product of each variable's conditional distribution given its parents, yielding an enormous saving in parameters.
d-separation and independence: The graphical criterion of d-separation reads conditional independencies directly off the network structure, characterizing exactly which independence statements are implied by the graph regardless of the numerical parameters.
Belief networks as plausible inference: Pearl's belief-network framework showed how local conditional probabilities and message passing capture coherent plausible inference, establishing directed graphical models as a sound and practical tool for representing uncertain knowledge.

Clinical relevance

Bayesian networks are used for medical diagnosis, fault and risk analysis, sensor fusion, gene-regulatory and other biological network modeling, and decision support, because they make complex probabilistic dependencies explicit and let evidence be propagated to update beliefs about unobserved variables.

History

Bayesian networks were developed by Judea Pearl in the 1980s as a graphical formalism for plausible inference, set out fully in his 1988 book. They unified earlier probabilistic and graphical ideas, became the canonical directed graphical model, and were later extended and systematized in the probabilistic-graphical-models literature.

Key figures

Judea Pearl
Daphne Koller
Nir Friedman
David Heckerman

Seminal works

pearl1986
pearl1988

Frequently asked questions

Why are Bayesian networks more compact than a full joint distribution?: A full joint distribution over n binary variables needs about 2^n numbers. A Bayesian network only stores, for each variable, its probability given its parents, so when each variable has few parents the total number of parameters grows roughly linearly rather than exponentially in the number of variables.
What does d-separation tell you?: d-separation is a graphical test that determines, from the network structure alone, whether two sets of variables are conditionally independent given a third set of observed variables. It lets you read independence relationships off the graph without examining the actual probability numbers.