Machine learningDeep learning / NLP / CV

멀티모달 비전 트랜스포머

멀티모달 비전 트랜스포머(Multimodal ViT)는 비전 트랜스포머(Vision Transformer) 아키텍처를 확장하여, 일반적으로 이미지와 텍스트인 여러 모달리티의 표현을 공동으로 처리하고 정렬하기 위해 셀프 어텐션(self-attention) 및 크로스 어텐션(cross-attention) 메커니즘을 사용합니다. 모달리티 간에 공유되거나 정렬된 임베딩 공간을 학습함으로써, 시각적 질의응답(visual question answering), 이미지-텍스트 검색(image-text retrieval), 시각적 근거 찾기(visual grounding), 이미지 캡셔닝(image captioning)과 같은 작업을 가능하게 합니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

멀티모달 비전 트랜스포머

BERT 기반 분류 미세 조정된 비전 트랜스포머 이미지 분류 멀티모달 BERT 기반 분류 Vision Transformer 설명 가능한 비전 트랜스포머(Explaina…다국어 비전 트랜스포머 다중 양식 확산 모델 다중 모달 인스턴스 분할 다중 양식 강화학습 (Multimodal R…

+1 more

출처

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations (ICLR). link ↗
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), PMLR 139. link ↗

이 페이지 인용 방법

ScholarGate. (2026, June 3). Multimodal Vision Transformer (Multimodal ViT). ScholarGate. https://scholargate.app/ko/deep-learning/multimodal-vision-transformer

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

이 방법을 참조하는 항목

설명 가능한 비전 트랜스포머(Explainable Vision Transformer)다국어 비전 트랜스포머 다중 양식 확산 모델 다중 모달 인스턴스 분할 다중 양식 강화학습 (Multimodal Reinforcement Learning)Self-supervised Vision Transformer

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →

방법 전문 읽기

Method map

출처

이 페이지 인용 방법

관련 방법

Which method?

이 방법을 참조하는 항목