Machine learningDeep learning / NLP / CV
微调强化学习
微调强化学习(Fine-Tuned Reinforcement Learning)通过强化信号(包括人类反馈)而非从头开始训练,使预训练策略或模型适应新任务或行为目标。它因RLHF而普及,是校准大型语言模型和使深度强化学习智能体适应特定环境(只需最少额外数据)的核心技术。
阅读完整方法
仅限会员
登录使用免费账户登录即可阅读本节。
Method map
The neighbourhood of related methods — select a node to explore.
来源
- Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. link ↗
- Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30. link ↗
如何引用本页
ScholarGate. (2026, June 3). Fine-Tuned Reinforcement Learning (Policy Adaptation via Fine-Tuning). ScholarGate. https://scholargate.app/zh/deep-learning/fine-tuned-reinforcement-learning
Which method?
Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.
- 微调 BERT 分类深度学习↔ compare
- 微调Transformer深度学习↔ compare
- 强化学习深度学习↔ compare
- 自监督强化学习深度学习↔ compare
- 迁移学习与强化学习 (Transfer RL) 是一种训练范式,其中代理在一个或多个源任务中获得的知识深度学习↔ compare