ScholarGate
助手
Machine learningDeep learning / NLP / CV

微调强化学习

微调强化学习(Fine-Tuned Reinforcement Learning)通过强化信号(包括人类反馈)而非从头开始训练,使预训练策略或模型适应新任务或行为目标。它因RLHF而普及,是校准大型语言模型和使深度强化学习智能体适应特定环境(只需最少额外数据)的核心技术。

在 MethodMind 中打开即将推出视频即将推出Download slides

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

登录

Method map

The neighbourhood of related methods — select a node to explore.

来源

  1. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. link
  2. Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30. link

如何引用本页

ScholarGate. (2026, June 3). Fine-Tuned Reinforcement Learning (Policy Adaptation via Fine-Tuning). ScholarGate. https://scholargate.app/zh/deep-learning/fine-tuned-reinforcement-learning

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

被引用于

ScholarGateFine-Tuned Reinforcement Learning (Fine-Tuned Reinforcement Learning (Policy Adaptation via Fine-Tuning)). 于 2026-06-15 检索自 https://scholargate.app/zh/deep-learning/fine-tuned-reinforcement-learning · 数据集: https://doi.org/10.5281/zenodo.20539026