Hypothesis test

多臂老虎机 (UCB, Thompson Sampling)

多臂老虎机 (MAB) 是一种自适应实验框架，它将试验顺序分配给竞争臂，以最小化累积遗憾，同时学习哪个臂表现最佳。它由 Robbins 于 1952 年形式化，并由 Auer 等人 (2002) 提供了有限时间保证，它在探索不确定选项与利用当前已知最佳选项之间取得平衡——在需要提前停止或成本敏感分配时，其表现优于经典的 A/B 测试。

用 PaperMind 寻找选题即将推出视频即将推出下载幻灯片

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

方法图谱

相关方法的邻域——选择一个节点以展开探索。

多臂老虎机 (UCB, Thompson Sampling)

A/B 测试（在线对照实验）自适应临床试验设计随机对照试验 (RCT)顺序/分组顺序试验设计

来源

Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-Time Analysis of the Multiarmed Bandit Problem. Machine Learning, 47(2–3), 235–256. DOI: 10.1023/A:1013689704352 ↗
Russo, D., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2018). A Tutorial on Thompson Sampling. Foundations and Trends in Machine Learning, 11(1), 1–96. DOI: 10.1561/2200000070 ↗

如何引用本页

ScholarGate. (2026, June 1). Multi-Armed Bandit (UCB, Thompson Sampling). ScholarGate. https://scholargate.app/zh/experimental-design/multiarm-bandit

选用哪种方法？

将本方法与其最相近的同类并置，并排研读——本馆将书籍铺陈于案上，取舍则由您定夺。

并排比较 →

被引用于

A/B 测试（在线对照实验）

发现本页有问题？报告或提出修改建议 →