Machine learningDeep learning / NLP / CV

Weakly Supervised Vision Transformer

Weakly Supervised Vision Transformer (WS-ViT) は、ピクセルレベルの正確なアノテーションを欠く画像データに対して、画像レベルのクラスタグ、バウンディングボックス、ウェブから収集したテキストなどの、より安価でノイズの多い教師信号を用いてVision Transformerを訓練する手法である。Transformerのグローバル自己注意機構は、オブジェクトの局在化や、これらの不完全なラベルから識別的特徴を学習する能力に特に優れている。

MethodMindで開く近日公開動画近日公開Download slides

手法の全文を読む

会員限定

無料アカウントでログインすると、このセクションを読めます。

ログイン

Method map

The neighbourhood of related methods — select a node to explore.

Weakly Supervised Vision Transformer

知識蒸留自己教師あり学習半教師あり学習ビジョントランスフォーマー

出典

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR). link ↗
Zhou, Z.-H. (2022). A brief introduction to weakly supervised learning. National Science Review, 5(1), 44–53. DOI: 10.1093/nsr/nwx106 ↗

このページの引用方法

ScholarGate. (2026, June 3). Weakly Supervised Vision Transformer (WS-ViT). ScholarGate. https://scholargate.app/ja/deep-learning/weakly-supervised-vision-transformer

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

このページに誤りを見つけましたか?報告・修正提案 →

手法の全文を読む

Method map

出典

このページの引用方法

関連手法

Which method?