Machine learning

Vision Transformer

2021년 Dosovitskiy와 동료들이 소개한 Vision Transformer(ViT)는 이미지를 고정된 크기의 패치로 분할하고, 이 패치들을 시퀀스로 취급하며, Transformer의 자기 주의(self-attention) 메커니즘을 이미지 분류에 적용합니다. 충분한 훈련 데이터가 주어지면, ViT는 컨볼루션 신경망(CNN)을 능가합니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

Vision Transformer

확산 모델 생성적 적대 신경망 랜덤 포레스트 서포트 벡터 머신 (분류)Variational Autoencoder BERT 미세 조정 CLIP 도메인 적응 트랜스포머 도메인 적응형 비전 트랜스포머 설명 가능한 비전 트랜스포머(Explaina…

+27 more

출처

Dosovitskiy, A. et al. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR. link ↗
Touvron, H. et al. (2021). Training Data-Efficient Image Transformers. ICML. link ↗

이 페이지 인용 방법

ScholarGate. (2026, June 1). Vision Transformer (ViT). ScholarGate. https://scholargate.app/ko/deep-learning/vision-transformer

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

이 방법을 참조하는 항목

BERT 미세 조정 CLIP 도메인 적응 트랜스포머 도메인 적응형 비전 트랜스포머 설명 가능한 비전 트랜스포머(Explainable Vision Transformer)미세 조정된 비전 트랜스포머 GPT 파인튜닝 이미지 분류 Kolmogorov-Arnold Networks LoRA 및 PEFT 맘바 (상태 공간 모델)Masked Autoencoders 다국어 비전 트랜스포머 멀티모달 BERT 기반 분류 다중 양식 자연어 처리 다중 양식 의미론적 분할(Multimodal Semantic Segmentation)다중 모달 트랜스포머 멀티모달 비전 트랜스포머 세그먼트 애니띵 모델 Self-supervised GAN 자기 지도 학습 이미지 분류 Self-supervised Instance Segmentation 픽셀 단위의 수동 주석 마스크에 의존하지 않고 이미지의 모든 픽셀에 클래스 레이블을 할당하도록 학습하는 자기 지도 의미론적 분할.Self-supervised Vision Transformer Semi-supervised Vision Transformer SimCLR 공간-시간 그래프 컨볼루션 네트워크 Swin Transformer TimeGPT Vision Mamba 약지도 객체 탐지 약지도 학습 비전 트랜스포머

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →

방법 전문 읽기

Method map

출처

이 페이지 인용 방법

관련 방법

Which method?

이 방법을 참조하는 항목