Process / pipelineSimulation / optimization
贝叶斯动态规划 — 采用贝叶斯信念更新的序贯决策优化
贝叶斯动态规划(Bayesian Dynamic Programming, BDP)将贝尔曼的动态规划框架与贝叶斯推理相结合,用于在转移概率或奖励结构未知的情况下优化序贯决策。在每个阶段,智能体利用观测到的结果更新对环境的信念,然后计算一个最优策略,该策略明确地考虑了即时奖励和通过探索获得的信息价值。
阅读完整方法
仅限会员
登录使用免费账户登录即可阅读本节。
Method map
The neighbourhood of related methods — select a node to explore.
来源
- Bertsekas, D. P. (1995). Dynamic Programming and Optimal Control. Athena Scientific, Belmont, MA. ISBN: 9781886529267
- Duff, M. O. (2002). Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD Dissertation, University of Massachusetts Amherst. link ↗
如何引用本页
ScholarGate. (2026, June 3). Bayesian Dynamic Programming — Sequential decision optimization under uncertainty with Bayesian belief updating. ScholarGate. https://scholargate.app/zh/simulation/bayesian-dynamic-programming
Which method?
Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.
Compare side by side →