Machine learningDeep learning / NLP / CV

ファインチューニング強化学習

ファインチューニング強化学習（Fine-Tuned Reinforcement Learning）は、ゼロから再学習するのではなく、強化学習シグナル（人間のフィードバックを含む）を用いて、事前学習済みのポリシーやモデルを新しいタスクや行動目標に適応させる手法である。RLHFによって普及し、大規模言語モデルの調整や、最小限の追加データで深層強化学習エージェントを特殊な環境に適応させるための核となる技術となっている。

MethodMindで開く近日公開動画近日公開Download slides

手法の全文を読む

会員限定

無料アカウントでログインすると、このセクションを読めます。

ログイン

Method map

The neighbourhood of related methods — select a node to explore.

ファインチューニング強化学習

ファインチューニングされたBERTベースの分類 Fine-Tuned Transformer 強化学習自己教師あり強化学習強化学習における転移学習多言語強化学習

出典

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. link ↗
Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30. link ↗

このページの引用方法

ScholarGate. (2026, June 3). Fine-Tuned Reinforcement Learning (Policy Adaptation via Fine-Tuning). ScholarGate. https://scholargate.app/ja/deep-learning/fine-tuned-reinforcement-learning

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

この手法を参照する項目

多言語強化学習強化学習における転移学習

このページに誤りを見つけましたか?報告・修正提案 →

手法の全文を読む

Method map

出典

このページの引用方法

関連手法

Which method?

この手法を参照する項目