Machine learningReinforcement learning

Policy Gradient Methods

Policy gradient methods are reinforcement-learning algorithms that optimize a parameterized policy directly by gradient ascent on the expected return, rather than learning action-values and acting greedily. Founded on Ronald Williams' 1992 REINFORCE algorithm and the policy gradient theorem of Sutton and colleagues (2000), they naturally handle stochastic and continuous action spaces and underpin modern actor-critic and deep-RL algorithms.

MethodMind'de açSoonVideoSoon

Tam yöntemi oku

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256. DOI: 10.1007/BF00992696
  2. Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12, 1057–1063. link

Related methods

Referenced by

ScholarGatePolicy Gradient (Policy Gradient Methods (REINFORCE / Actor-Critic)). Retrieved 2026-06-04 from https://scholargate.app/tr/machine-learning/policy-gradient