Machine learningReinforcement learning

পলিসি গ্রেডিয়েন্ট পদ্ধতি

পলিসি গ্রেডিয়েন্ট পদ্ধতি হলো রিইনফোর্সমেন্ট লার্নিং অ্যালগরিদম যা প্রত্যাশিত রিটার্নের উপর গ্রেডিয়েন্ট অ্যাসেন্ডের মাধ্যমে সরাসরি প্যারামিটারাইজড পলিসি অপ্টিমাইজ করে, অ্যাকশন-ভ্যালু শেখা এবং গ্রিডিভাবে কাজ করার পরিবর্তে। রোনাল্ড উইলিয়ামসের ১৯৯২ সালের REINFORCE অ্যালগরিদম এবং সাটন ও সহকর্মীদের (২০০০) পলিসি গ্রেডিয়েন্ট তত্ত্বের উপর ভিত্তি করে, এরা স্বাভাবিকভাবেই স্টোকাস্টিক এবং কন্টিনিউয়াস অ্যাকশন স্পেস পরিচালনা করে এবং আধুনিক অ্যাক্টর-ক্রিটিক ও ডিপ-আরএল অ্যালগরিদমকে শক্তিশালী করে।

MethodMind-এ খুলুনশীঘ্রইভিডিওশীঘ্রইDownload slides

পুরো পদ্ধতিটি পড়ুন

শুধু সদস্যদের জন্য

এই অংশটি পড়তে বিনামূল্যের অ্যাকাউন্ট দিয়ে সাইন ইন করুন।

সাইন ইন করুন

Method map

The neighbourhood of related methods — select a node to explore.

পলিসি গ্রেডিয়েন্ট পদ্ধতি

উত্তল অপ্টিমাইজেশন গভীর শক্তিশালীকরণ শিক্ষা Q-Learning স্টোকাস্টিক গ্রেডিয়েন্ট…রিইনফোর্সমেন্ট লার্নিং

উৎস

Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256. DOI: 10.1007/BF00992696 ↗
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12, 1057–1063. link ↗

এই পৃষ্ঠা কীভাবে উদ্ধৃত করবেন

ScholarGate. (2026, June 2). Policy Gradient Methods (REINFORCE / Actor-Critic). ScholarGate. https://scholargate.app/bn/machine-learning/policy-gradient

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

উত্তল অপ্টিমাইজেশনঅনুকূলকরণ↔ compare
গভীর শক্তিশালীকরণ শিক্ষাগভীর শিখন↔ compare
Q-Learningযন্ত্র শিখন↔ compare
স্টোকাস্টিক গ্রেডিয়েন্ট ডিসেন্ট (SGD)যন্ত্র শিখন↔ compare

Compare side by side →

যেখানে উদ্ধৃত

Q-Learning রিইনফোর্সমেন্ট লার্নিং

এই পৃষ্ঠায় কোনো ত্রুটি চোখে পড়েছে? জানান বা সংশোধনের প্রস্তাব দিন →