Machine learningDeep learning / NLP / CV

미세조정 강화학습

미세조정 강화학습(Fine-Tuned Reinforcement Learning)은 사전 훈련된 정책(policy) 또는 모델을 처음부터 재훈련하는 대신, 강화 신호(인간 피드백 포함)를 사용하여 새로운 작업이나 행동 목표에 적응시키는 기법입니다. RLHF(Reinforcement Learning from Human Feedback)로 대중화된 이 기법은 대규모 언어 모델을 정렬(align)하고 심층 강화학습 에이전트를 최소한의 추가 데이터로 특수 환경에 적응시키는 핵심 기술입니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

미세조정 강화학습

BERT 기반 미세조정 분류 파인튜닝 트랜스포머 강화학습 자기 지도 강화 학습 강화학습에서의 전이 학습 다국어 강화학습

출처

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. link ↗
Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30. link ↗

이 페이지 인용 방법

ScholarGate. (2026, June 3). Fine-Tuned Reinforcement Learning (Policy Adaptation via Fine-Tuning). ScholarGate. https://scholargate.app/ko/deep-learning/fine-tuned-reinforcement-learning

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

이 방법을 참조하는 항목

다국어 강화학습 강화학습에서의 전이 학습

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →

방법 전문 읽기

Method map

출처

이 페이지 인용 방법

관련 방법

Which method?

이 방법을 참조하는 항목