ScholarGate
Msaidizi
Machine learning

Transformer wa Maono

Transformer wa Maono (ViT), ulioanzishwa na Dosovitskiy na wenzake mwaka 2021, hugawanya picha katika vipande vya ukubwa sawa, huwatendea vipande hivyo kama mfuatano, na hutumia utaratibu wa kujitazama wa Transformer kwa ajili ya uainishaji wa picha. Kwa data ya kutosha ya mafunzo, unazidi mitandao ya neva ya konvolusheni (CNNs).

Fungua katika MethodMindHivi karibuniVideoHivi karibuniDownload slides

Soma mbinu kamili

Kwa wanachama pekee

Ingia kwa akaunti ya bure ili kusoma sehemu hii.

Ingia

Method map

The neighbourhood of related methods — select a node to explore.

+27 more

Vyanzo

  1. Dosovitskiy, A. et al. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR. link
  2. Touvron, H. et al. (2021). Training Data-Efficient Image Transformers. ICML. link

Jinsi ya kunukuu ukurasa huu

ScholarGate. (2026, June 1). Vision Transformer (ViT). ScholarGate. https://scholargate.app/sw/deep-learning/vision-transformer

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

Imerejelewa na

ScholarGateVision Transformer (Vision Transformer (ViT)). Imepatikana 2026-06-15 kutoka https://scholargate.app/sw/deep-learning/vision-transformer · Seti ya data: https://doi.org/10.5281/zenodo.20539026