ScholarGate
助手
Process / pipelineSimulation / optimization

贝叶斯动态规划 — 采用贝叶斯信念更新的序贯决策优化

贝叶斯动态规划(Bayesian Dynamic Programming, BDP)将贝尔曼的动态规划框架与贝叶斯推理相结合,用于在转移概率或奖励结构未知的情况下优化序贯决策。在每个阶段,智能体利用观测到的结果更新对环境的信念,然后计算一个最优策略,该策略明确地考虑了即时奖励和通过探索获得的信息价值。

在 MethodMind 中打开即将推出视频即将推出Download slides

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

登录

Method map

The neighbourhood of related methods — select a node to explore.

来源

  1. Bertsekas, D. P. (1995). Dynamic Programming and Optimal Control. Athena Scientific, Belmont, MA. ISBN: 9781886529267
  2. Duff, M. O. (2002). Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD Dissertation, University of Massachusetts Amherst. link

如何引用本页

ScholarGate. (2026, June 3). Bayesian Dynamic Programming — Sequential decision optimization under uncertainty with Bayesian belief updating. ScholarGate. https://scholargate.app/zh/simulation/bayesian-dynamic-programming

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

被引用于

ScholarGateBayesian Dynamic Programming (Bayesian Dynamic Programming — Sequential decision optimization under uncertainty with Bayesian belief updating). 于 2026-06-15 检索自 https://scholargate.app/zh/simulation/bayesian-dynamic-programming · 数据集: https://doi.org/10.5281/zenodo.20539026