Machine learningDeep learning / NLP / CV

微调强化学习

微调强化学习（Fine-Tuned Reinforcement Learning）通过强化信号（包括人类反馈）而非从头开始训练，使预训练策略或模型适应新任务或行为目标。它因RLHF而普及，是校准大型语言模型和使深度强化学习智能体适应特定环境（只需最少额外数据）的核心技术。

在 MethodMind 中打开即将推出视频即将推出Download slides

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

Method map

The neighbourhood of related methods — select a node to explore.

微调强化学习

微调 BERT 分类微调Transformer 强化学习自监督强化学习迁移学习与强化学习 (Transfer RL)…多语言强化学习

来源

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. link ↗
Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30. link ↗

如何引用本页

ScholarGate. (2026, June 3). Fine-Tuned Reinforcement Learning (Policy Adaptation via Fine-Tuning). ScholarGate. https://scholargate.app/zh/deep-learning/fine-tuned-reinforcement-learning

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

微调 BERT 分类深度学习↔ compare
微调Transformer深度学习↔ compare
强化学习深度学习↔ compare
自监督强化学习深度学习↔ compare
迁移学习与强化学习 (Transfer RL) 是一种训练范式，其中代理在一个或多个源任务中获得的知识深度学习↔ compare

Compare side by side →

被引用于

多语言强化学习迁移学习与强化学习 (Transfer RL) 是一种训练范式，其中代理在一个或多个源任务中获得的知识

发现本页有问题？报告或提出修改建议 →

阅读完整方法

Method map

来源

如何引用本页

相关方法

Which method?

被引用于