ScholarGate
Assistent
Machine learningReinforcement learning

Policy Gradient-metoder

Policy gradient-metoder er algoritmer inden for forstærkningslæring, der optimerer en parametriseret politik direkte ved gradientstigning på det forventede afkast, snarere end at lære handlingsværdier og handle grådigt. Baseret på Ronald Williams' REINFORCE-algoritme fra 1992 og policy gradient-teoremet af Sutton og kolleger (2000) håndterer de naturligt stokastiske og kontinuerlige handlingsrum og danner grundlag for moderne actor-critic- og deep-RL-algoritmer.

Åbn i MethodMindSnartVideoSnartDownload slides

Læs hele metoden

Kun for medlemmer

Log ind med en gratis konto for at læse dette afsnit.

Log ind

Method map

The neighbourhood of related methods — select a node to explore.

Kilder

  1. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256. DOI: 10.1007/BF00992696
  2. Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12, 1057–1063. link

Sådan citerer du denne side

ScholarGate. (2026, June 2). Policy Gradient Methods (REINFORCE / Actor-Critic). ScholarGate. https://scholargate.app/da/machine-learning/policy-gradient

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

Refereret af

ScholarGatePolicy Gradient (Policy Gradient Methods (REINFORCE / Actor-Critic)). Hentet 2026-06-15 fra https://scholargate.app/da/machine-learning/policy-gradient · Datasæt: https://doi.org/10.5281/zenodo.20539026