What is a reward prediction error?

It is the difference between the reward an outcome delivers and the reward that was expected. Midbrain dopamine neurons signal this difference, firing more for better-than-expected outcomes and less for worse-than-expected ones, which provides a learning signal that updates future expectations.

Is dopamine the brain's 'pleasure chemical'?

That popular description is misleading. Much evidence indicates that phasic dopamine signals primarily relate to learning and the prediction of reward rather than to the experience of pleasure itself, which appears to involve other systems.

Reward and Decision-Making

Reward processing and value-based decision-making concern how the brain represents the value of outcomes, learns from the consequences of actions, and chooses among options. Midbrain dopamine neurons signal discrepancies between expected and received reward, and a network including the striatum, orbitofrontal, and ventromedial prefrontal cortex computes and compares the value of choices to guide behaviour.

Hitta ämne med PaperMindSnartFind papers & topics

Tools & resources

Ladda ner bildspel

Learn & explore

VideoSnart

Definition

Reward and decision-making is the study of how the brain assigns value to outcomes, updates expectations through learning from prediction errors, and uses these value representations to select among competing actions.

Scope

This topic covers the neuroscience of reward and value-based decision-making as reference material in cognitive neuroscience. It introduces reward prediction-error signalling, the brain's valuation systems, reinforcement-learning frameworks, and the relevance of these circuits to motivation and to disorders of reward. It explains mechanisms and evidence and is not clinical guidance.

Core questions

How does the brain represent the value of different outcomes and options?
How do dopamine signals and reinforcement-learning mechanisms allow the brain to learn from reward and punishment?
Which regions compute, compare, and act on value during decision-making?

Key concepts

Reward prediction error
Phasic dopamine signalling
Reinforcement learning and temporal-difference learning
Subjective and expected value
Orbitofrontal and ventromedial prefrontal valuation
Striatum and action value
Exploration versus exploitation
Reward-related disorders

Key theories

Reward prediction-error hypothesis of dopamine: Phasic activity of midbrain dopamine neurons encodes a reward prediction error, the difference between received and expected reward, providing a teaching signal of the kind used in temporal-difference reinforcement learning to update value estimates.
Value-based decision-making framework: Choice is decomposed into stages, representation of options, valuation, action selection, outcome evaluation, and learning, allowing distinct neural systems to be mapped onto each computational step rather than treating decision as a single process.

Mechanisms

A central mechanism is the reward prediction error: midbrain dopamine neurons increase firing when an outcome is better than expected and decrease firing when it is worse, a pattern matching the teaching signal of temporal-difference reinforcement learning (Schultz et al., 1997). These signals are thought to update value representations in target regions, particularly the striatum, where neuronal activity reflects the value of available actions (Samejima et al., 2005). Orbitofrontal and ventromedial prefrontal cortex represent the value of goods and options on a common scale that allows comparison across choices (Wallis, 2007). Decision-making can be analysed as a sequence of computational stages, representation, valuation, selection, and learning, each supported by partly distinct circuits (Rangel et al., 2008).

Clinical relevance

Reward and valuation circuits are implicated in how researchers and clinicians understand motivation and a range of conditions, including addiction, depression, and the effects of dopaminergic disease and treatment, as shown by altered reinforcement learning in Parkinson's disease (Frank et al., 2004). This entry is an educational reference to reward and decision mechanisms and is not a basis for diagnosing or treating any individual.

Evidence & guidelines

The account rests on convergent evidence from single-unit recording in animals, human neuroimaging, computational modelling, and studies of patients with dopaminergic disorders (Schultz et al., 1997; Samejima et al., 2005; Frank et al., 2004), synthesized in major reviews of valuation and choice (Rangel et al., 2008; Wallis, 2007).

History

Early electrical self-stimulation experiments in the 1950s identified brain regions whose activation animals would work to obtain, establishing the idea of a reward system. Through the 1980s and 1990s, recordings of midbrain dopamine neurons by Schultz and colleagues, interpreted with reinforcement-learning theory developed by Sutton and Barto and applied by Montague and Dayan, recast dopamine as a prediction-error signal rather than a pleasure signal. The subsequent emergence of neuroeconomics integrated economic theories of value with neuroscience to study how the brain computes and compares value during choice.

Debates

What exactly does dopamine encode?: The prediction-error account is influential, but debate continues over whether phasic dopamine signals strictly a reward prediction error or also conveys salience, novelty, or motivational vigour, and how tonic and phasic signals differ in function.

Key figures

Wolfram Schultz
Peter Dayan
P. Read Montague
Antonio Rangel
Michael Frank

Seminal works

schultz-1997
rangel-2008
wallis-2007

Frequently asked questions

What is a reward prediction error?: It is the difference between the reward an outcome delivers and the reward that was expected. Midbrain dopamine neurons signal this difference, firing more for better-than-expected outcomes and less for worse-than-expected ones, which provides a learning signal that updates future expectations.
Is dopamine the brain's 'pleasure chemical'?: That popular description is misleading. Much evidence indicates that phasic dopamine signals primarily relate to learning and the prediction of reward rather than to the experience of pleasure itself, which appears to involve other systems.