Machine learningDeep learning / NLP / CV

다국어 비전 트랜스포머

다국어 비전 트랜스포머(Multilingual Vision Transformer, 다국어 ViT)는 비전 트랜스포머(Vision Transformer) 아키텍처를 확장하여 여러 언어에 걸쳐 작동하도록 함으로써, 다국어 또는 교차 언어 환경에서 이미지 이해 및 이미지-텍스트 추론을 가능하게 합니다. 이는 패치 기반 이미지 인코딩과 다국어 텍스트 표현을 결합하여 단일 모델이 이미지 캡셔닝, 시각 질의응답, 교차 언어 이미지 검색과 같은 작업을 위해 다양한 언어 커뮤니티에 서비스를 제공할 수 있도록 합니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

다국어 비전 트랜스포머

다국어 RoBERTa 기반 분류 다국어 문장 임베딩 멀티모달 비전 트랜스포머 Vision Transformer 다국어 이미지 분류

출처

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations (ICLR 2021). link ↗
Bugliarello, E., Liu, F., Pfeiffer, J., Reddy, S., Elliott, D., Erdem, E., Erdem, A., & Lukasiewicz, T. (2022). IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages. International Conference on Machine Learning (ICML 2022). link ↗

이 페이지 인용 방법

ScholarGate. (2026, June 3). Multilingual Vision Transformer (Multilingual ViT). ScholarGate. https://scholargate.app/ko/deep-learning/multilingual-vision-transformer

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

이 방법을 참조하는 항목

다국어 이미지 분류

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →