Process / pipelineSimulation / optimization

贝叶斯动态规划 — 采用贝叶斯信念更新的序贯决策优化

贝叶斯动态规划（Bayesian Dynamic Programming, BDP）将贝尔曼的动态规划框架与贝叶斯推理相结合，用于在转移概率或奖励结构未知的情况下优化序贯决策。在每个阶段，智能体利用观测到的结果更新对环境的信念，然后计算一个最优策略，该策略明确地考虑了即时奖励和通过探索获得的信息价值。

在 MethodMind 中打开即将推出视频即将推出Download slides

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

Method map

The neighbourhood of related methods — select a node to explore.

贝叶斯动态规划

贝叶斯马尔可夫模型动态规划强化学习随机动态规划贝叶斯目标规划贝叶斯线性规划贝叶斯敏感性分析

来源

Bertsekas, D. P. (1995). Dynamic Programming and Optimal Control. Athena Scientific, Belmont, MA. ISBN: 9781886529267
Duff, M. O. (2002). Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD Dissertation, University of Massachusetts Amherst. link ↗

如何引用本页

ScholarGate. (2026, June 3). Bayesian Dynamic Programming — Sequential decision optimization under uncertainty with Bayesian belief updating. ScholarGate. https://scholargate.app/zh/simulation/bayesian-dynamic-programming

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

被引用于

贝叶斯目标规划贝叶斯线性规划贝叶斯敏感性分析

发现本页有问题？报告或提出修改建议 →