Mbinu za Kielelezo cha Sera
Mbinu za kielelezo cha sera ni algoriti za kujifunza kwa kurudisha nyuma ambazo huboresha sera yenye vigezo moja kwa moja kwa kupanda kwa kielelezo juu ya matarajio ya kurudi, badala ya kujifunza maadili ya vitendo na kutenda kwa pupa. Zilizowekwa juu ya algoriti ya REINFORCE ya Ronald Williams ya 1992 na nadharia ya kielelezo cha sera ya Sutton na wenzake (2000), hushughulikia kwa kawaida maeneo ya vitendo yanayobadilika na yanayoendelea na huunda msingi wa algoriti za kisasa za mwigizaji-mkosoaji na za kina za kurudisha nyuma.
Soma mbinu kamili
Ingia kwa akaunti ya bure ili kusoma sehemu hii.
Method map
The neighbourhood of related methods — select a node to explore.
Vyanzo
- Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256. DOI: 10.1007/BF00992696 ↗
- Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12, 1057–1063. link ↗
Jinsi ya kunukuu ukurasa huu
ScholarGate. (2026, June 2). Policy Gradient Methods (REINFORCE / Actor-Critic). ScholarGate. https://scholargate.app/sw/machine-learning/policy-gradient
Which method?
Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.
- Uboreshaji MbonyeoUboreshaji↔ compare
- Ujifunzaji wa Kina wa UimarishajiUjifunzaji wa Kina↔ compare
- Q-LearningUjifunzaji wa Mashine↔ compare
- Kushuka kwa Gradient kwa Bahati Nasibu (SGD)Ujifunzaji wa Mashine↔ compare
Imerejelewa na
Umeona tatizo kwenye ukurasa huu? Ripoti au pendekeza marekebisho →