ScholarGate
Msaidizi
Machine learningReinforcement learning

Mbinu za Kielelezo cha Sera

Mbinu za kielelezo cha sera ni algoriti za kujifunza kwa kurudisha nyuma ambazo huboresha sera yenye vigezo moja kwa moja kwa kupanda kwa kielelezo juu ya matarajio ya kurudi, badala ya kujifunza maadili ya vitendo na kutenda kwa pupa. Zilizowekwa juu ya algoriti ya REINFORCE ya Ronald Williams ya 1992 na nadharia ya kielelezo cha sera ya Sutton na wenzake (2000), hushughulikia kwa kawaida maeneo ya vitendo yanayobadilika na yanayoendelea na huunda msingi wa algoriti za kisasa za mwigizaji-mkosoaji na za kina za kurudisha nyuma.

Fungua katika MethodMindHivi karibuniVideoHivi karibuniDownload slides

Soma mbinu kamili

Kwa wanachama pekee

Ingia kwa akaunti ya bure ili kusoma sehemu hii.

Ingia

Method map

The neighbourhood of related methods — select a node to explore.

Vyanzo

  1. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256. DOI: 10.1007/BF00992696
  2. Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12, 1057–1063. link

Jinsi ya kunukuu ukurasa huu

ScholarGate. (2026, June 2). Policy Gradient Methods (REINFORCE / Actor-Critic). ScholarGate. https://scholargate.app/sw/machine-learning/policy-gradient

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

Imerejelewa na

ScholarGatePolicy Gradient (Policy Gradient Methods (REINFORCE / Actor-Critic)). Imepatikana 2026-06-15 kutoka https://scholargate.app/sw/machine-learning/policy-gradient · Seti ya data: https://doi.org/10.5281/zenodo.20539026