قارن الطرق
راجع الطرق التي اخترتها جنبًا إلى جنب؛ الصفوف المختلفة مميَّزة.
| التعلم المعزز المضبوط بدقة× | محوّل مُعدَّل× | |
|---|---|---|
| المجال | التعلم العميق | التعلم العميق |
| العائلة | Machine learning | Machine learning |
| سنة النشأة≠ | 2017–2022 | 2017–2019 |
| صاحب الطريقة≠ | Christiano, P. et al.; Ouyang, L. et al. | Vaswani et al. (architecture); fine-tuning paradigm popularised by Howard & Ruder, Devlin et al. |
| النوع≠ | Policy adaptation via fine-tuning | Transfer learning / supervised fine-tuning |
| المصدر التأسيسي≠ | Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. link ↗ | Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. link ↗ |
| الأسماء البديلة | RL fine-tuning, policy fine-tuning, RLHF, reinforcement learning from human feedback | Transformer fine-tuning, pre-trained transformer fine-tuning, task-adaptive transformer, downstream-tuned transformer |
| ذات صلة≠ | 5 | 4 |
| الملخص≠ | Fine-Tuned Reinforcement Learning adapts a pre-trained policy or model to a new task or behavioral objective using reinforcement signals — including human feedback — rather than retraining from scratch. Popularized by RLHF, it is the core technique behind aligning large language models and adapting deep RL agents to specialized environments with minimal additional data. | Fine-tuning a Transformer adapts a large pre-trained model — such as BERT, GPT, or ViT — to a specific downstream task by continuing gradient-based training on a labelled target dataset. This two-stage paradigm (pre-train then fine-tune) consistently achieves state-of-the-art results across NLP and computer vision tasks with far less task-specific data than training from scratch. |
| ScholarGateمجموعة البيانات ↗ |
|
|