Machine learningDeep learning / NLP / CV

약한 지도 강화학습

약한 지도 강화학습(Weakly Supervised Reinforcement Learning, WSRL)은 보상 신호가 불완전하거나, 희소하거나, 지연되거나, 부분적으로만 유용한 환경에서 에이전트를 학습시키는 것을 말합니다. 이는 명확한 지도를 받는 강화학습과는 대조적입니다. 에이전트는 불완전한 피드백에도 불구하고 효과적인 정책을 학습해야 하며, 이를 위해 보조 신호, 보상 모델링 또는 선호도 학습을 활용하여 약한 지도를 보완합니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

약한 지도 강화학습

강화학습 자기 지도 강화 학습 준지도 강화학습

출처

Sutton, R. S. & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. ISBN: 978-0-262-03924-6
Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S. & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems (NeurIPS), 30. link ↗

이 페이지 인용 방법

ScholarGate. (2026, June 3). Weakly Supervised Reinforcement Learning. ScholarGate. https://scholargate.app/ko/deep-learning/weakly-supervised-reinforcement-learning

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

이 방법을 참조하는 항목

준지도 강화학습

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →