ScholarGate
助手
Machine learning

GPT模型微调

GPT模型微调通过在人类反馈强化学习(RLHF)或直接偏好优化(DPO)的指导下,将预训练的自回归语言模型(如OpenAI在Radford及其同事2019年的工作中提出的GPT-2/3/4或LLaMA)适配到特定领域数据或指令遵循任务。它用于指令遵循、领域适应和生成任务。

在 MethodMind 中打开即将推出视频即将推出Download slides

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

登录

Method map

The neighbourhood of related methods — select a node to explore.

来源

  1. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI Technical Report. link
  2. Ouyang, L. et al. (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS. DOI: 10.48550/arXiv.2203.02155

如何引用本页

ScholarGate. (2026, June 1). GPT Fine-Tuning and Instruction Adaptation. ScholarGate. https://scholargate.app/zh/deep-learning/gpt-finetuning

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

被引用于

ScholarGateGPT Fine-Tuning (GPT Fine-Tuning and Instruction Adaptation). 于 2026-06-15 检索自 https://scholargate.app/zh/deep-learning/gpt-finetuning · 数据集: https://doi.org/10.5281/zenodo.20539026