Machine learning
Vision Transformer
The Vision Transformer (ViT), introduced by Dosovitskiy and colleagues in 2021, splits an image into fixed-size patches, treats those patches as a sequence, and applies the Transformer self-attention mechanism to image classification. Given enough training data, it surpasses convolutional neural networks (CNNs).
MethodMind'de açSoonVideoSoon
Tam yöntemi oku
Members only
Sign inSign in with a free account to read this section.
Sources
Related methods
Referenced by
BERT Fine-TuningCLIPDomain-adaptive transformerDomain-adaptive vision transformerExplainable Vision TransformerFine-Tuned Vision TransformerGPT Fine-TuningImage ClassificationKolmogorov-Arnold NetworksLoRA and PEFTMamba (State Space Model)Masked AutoencodersMultilingual vision transformerMultimodal BERT-based ClassificationMultimodal NLPMultimodal Semantic SegmentationMultimodal TransformerMultimodal Vision TransformerSegment Anything ModelSelf-supervised GANSelf-supervised Image ClassificationSelf-supervised Instance SegmentationSelf-supervised Semantic SegmentationSelf-supervised Vision TransformerSemi-supervised Vision TransformerSimCLRSpatial-Temporal GCNSwin TransformerTimeGPTVision MambaWeakly Supervised Object DetectionWeakly supervised vision transformer