Q-Learning
Q-learning, iliyoanzishwa na Christopher Watkins na Peter Dayan mwaka 1992, ni algorithm ya kujifunza kwa kuimarisha bila mfumo ambayo hujifunza thamani ya kuchukua kila hatua katika kila hali — kazi ya Q — kutoka kwa uzoefu tu, bila mfumo wa mazingira. Ni nje ya sera: hujifunza maadili bora ya hatua wakati ikifuata sera ya tabia ya uchunguzi, na chini ya hali za kawaida inathibitisha kufikia sera bora.
Soma mbinu kamili
Ingia kwa akaunti ya bure ili kusoma sehemu hii.
Method map
The neighbourhood of related methods — select a node to explore.
Vyanzo
- Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292. DOI: 10.1007/BF00992698 ↗
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. ISBN: 978-0-262-03924-6
Jinsi ya kunukuu ukurasa huu
ScholarGate. (2026, June 2). Q-Learning (Off-Policy Temporal-Difference Control). ScholarGate. https://scholargate.app/sw/machine-learning/q-learning
Which method?
Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.
- Ujifunzaji wa Kina wa UimarishajiUjifunzaji wa Kina↔ compare
- Programu SanifuUboreshaji↔ compare
- Mbinu za Kielelezo cha SeraUjifunzaji wa Mashine↔ compare
Imerejelewa na
Umeona tatizo kwenye ukurasa huu? Ripoti au pendekeza marekebisho →