ScholarGate
Msaidizi
Machine learningDeep learning / NLP / CV

Ujifunzaji wa Kuimarisha Uliosafishwa (Fine-Tuned Reinforcement Learning)

Ujifunzaji wa Kuimarisha Uliosafishwa hubadilisha sera au mfumo uliopatiwa mafunzo ya awali ili kuendana na kazi mpya au lengo la kitabia kwa kutumia ishara za uimarishaji — ikiwemo maoni ya binadamu — badala ya kutoa mafunzo upya kuanzia mwanzo. Umejulikana sana kupitia RLHF, na ni mbinu kuu inayotumika kuoanisha mifumo mikuu ya lugha na kurekebisha mawakala wa RL ya kina kwa mazingira maalum kwa kutumia data ndogo ya ziada.

Fungua katika MethodMindHivi karibuniVideoHivi karibuniPakua slaidi

Soma mbinu kamili

Kwa wanachama pekee

Ingia kwa akaunti ya bure ili kusoma sehemu hii.

Ingia

Ramani ya mbinu

Jirani ya mbinu zinazohusiana — chagua nodi ili kuchunguza.

Vyanzo

  1. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. link
  2. Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30. link

Jinsi ya kunukuu ukurasa huu

ScholarGate. (2026, June 3). Fine-Tuned Reinforcement Learning (Policy Adaptation via Fine-Tuning). ScholarGate. https://scholargate.app/sw/deep-learning/fine-tuned-reinforcement-learning

Mbinu ipi?

Weka mbinu hii kando ya jamaa zake wa karibu na uzisome bega kwa bega — maktaba huweka vitabu mezani; uamuzi ni wako.

Linganisha bega kwa bega

Imerejelewa na

ScholarGateFine-Tuned Reinforcement Learning (Fine-Tuned Reinforcement Learning (Policy Adaptation via Fine-Tuning)). Imepatikana 2026-06-15 kutoka https://scholargate.app/sw/deep-learning/fine-tuned-reinforcement-learning · Seti ya data: https://doi.org/10.5281/zenodo.20539026