Machine learning
GPT模型微调
GPT模型微调通过在人类反馈强化学习(RLHF)或直接偏好优化(DPO)的指导下,将预训练的自回归语言模型(如OpenAI在Radford及其同事2019年的工作中提出的GPT-2/3/4或LLaMA)适配到特定领域数据或指令遵循任务。它用于指令遵循、领域适应和生成任务。
阅读完整方法
仅限会员
登录使用免费账户登录即可阅读本节。
Method map
The neighbourhood of related methods — select a node to explore.
来源
- Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI Technical Report. link ↗
- Ouyang, L. et al. (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS. DOI: 10.48550/arXiv.2203.02155 ↗
如何引用本页
ScholarGate. (2026, June 1). GPT Fine-Tuning and Instruction Adaptation. ScholarGate. https://scholargate.app/zh/deep-learning/gpt-finetuning
Which method?
Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.
- LoRA 和 PEFT深度学习↔ compare
- 随机森林机器学习↔ compare
- 变分自编码器深度学习↔ compare
- Vision Transformer深度学习↔ compare
- XGBoost机器学习↔ compare