Machine learningReinforcement learning

Q-러닝

1992년 Christopher Watkins와 Peter Dayan이 소개한 Q-러닝은 환경에 대한 모델 없이 순전히 경험으로부터 각 상태에서 각 행동을 취하는 것의 가치, 즉 Q-함수를 학습하는 모델 프리 강화학습 알고리즘입니다. 이는 오프-폴리시(off-policy)로, 탐색적 행동 정책을 따르면서 최적 행동 가치를 학습하며, 표준 조건 하에서는 최적 정책으로 수렴함이 증명됩니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

Q-러닝

딥 강화학습 동적 계획법 정책 경사도 방법

출처

Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292. DOI: 10.1007/BF00992698 ↗
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. ISBN: 978-0-262-03924-6

이 페이지 인용 방법

ScholarGate. (2026, June 2). Q-Learning (Off-Policy Temporal-Difference Control). ScholarGate. https://scholargate.app/ko/machine-learning/q-learning

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

이 방법을 참조하는 항목

정책 경사도 방법

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →