Machine learning

GPT模型微调

GPT模型微调通过在人类反馈强化学习（RLHF）或直接偏好优化（DPO）的指导下，将预训练的自回归语言模型（如OpenAI在Radford及其同事2019年的工作中提出的GPT-2/3/4或LLaMA）适配到特定领域数据或指令遵循任务。它用于指令遵循、领域适应和生成任务。

在 MethodMind 中打开即将推出视频即将推出Download slides

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

The neighbourhood of related methods — select a node to explore.

GPT模型微调

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI Technical Report. link ↗
Ouyang, L. et al. (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS. DOI: 10.48550/arXiv.2203.02155 ↗

ScholarGate. (2026, June 1). GPT Fine-Tuning and Instruction Adaptation. ScholarGate. https://scholargate.app/zh/deep-learning/gpt-finetuning

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.