Machine learning
Vision Transformer
Vision Transformer (ViT) 由 Dosovitskiy 及其同事于 2021 年提出,它将图像分割成固定大小的块(patches),将这些块视为一个序列,并应用 Transformer 的自注意力机制(self-attention mechanism)进行图像分类。在有足够训练数据的情况下,其性能优于卷积神经网络(CNNs)。
阅读完整方法
仅限会员
登录使用免费账户登录即可阅读本节。
Method map
The neighbourhood of related methods — select a node to explore.
+27 more
来源
如何引用本页
ScholarGate. (2026, June 1). Vision Transformer (ViT). ScholarGate. https://scholargate.app/zh/deep-learning/vision-transformer
Which method?
Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.
Compare side by side →被引用于
BERT微调CLIP域自适应 Transformer领域自适应视觉 Transformer可解释视觉 Transformer微调视觉TransformerGPT模型微调图像分类Kolmogorov-Arnold NetworksLoRA 和 PEFTMamba(状态空间模型)掩码自编码器多语言视觉Transformer多模态BERT分类多模态自然语言处理多模态语义分割多模态Transformer多模态视觉变换器Segment Anything Model自监督生成对抗网络自监督图像分类自监督实例分割自监督语义分割自监督视觉Transformer半监督视觉变换器SimCLR时空图卷积网络Swin TransformerTimeGPT视觉曼巴弱监督目标检测弱监督视觉变换器