Machine learningDeep learning / NLP / CV

멀티모달 BERT 기반 분류

멀티모달 BERT 기반 분류는 BERT 트랜스포머 아키텍처를 확장하여, 여러 모달리티(가장 흔하게는 텍스트와 이미지가 쌍을 이룬 데이터)의 표현을 융합한 후 최종 분류 헤드로 분류함으로써 데이터를 공동으로 인코딩하고 분류합니다. 2019년경 MMBT 및 ViLBERT와 같은 모델을 통해 두드러지게 소개된 이 방식은 정확한 레이블링을 위해 텍스트나 이미지 단독으로는 충분한 정보를 담지 못하는 작업에 표준 접근법이 되었습니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

멀티모달 BERT 기반 분류

CLIP Vision Transformer 다중 양식 합성곱 신경망 다중 양식 확산 모델 Multimodal Doc2Vec 다중 양식 그래프 신경망 다중 모드 GRU 다중 양식 이미지 분류 Multimodal LDA Topic Mod…다중 양식 명사 개체 인식

+8 more

출처

Kiela, D., Bhooshan, S., Firooz, H., Perez, E., & Testuggine, D. (2019). Supervised multimodal bitransformers for classifying images and text. arXiv preprint arXiv:1909.02950. link ↗
Lu, J., Batra, D., Parikh, D., & Lee, S. (2019). ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in Neural Information Processing Systems, 32. link ↗

이 페이지 인용 방법

ScholarGate. (2026, June 3). Multimodal BERT-based Classification (Transformer Fusion of Text and Non-text Modalities). ScholarGate. https://scholargate.app/ko/deep-learning/multimodal-bert-based-classification

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

CLIP딥러닝↔ compare
Vision Transformer딥러닝↔ compare

Compare side by side →

이 방법을 참조하는 항목

다중 양식 합성곱 신경망 다중 양식 확산 모델 Multimodal Doc2Vec 다중 양식 그래프 신경망 다중 모드 GRU 다중 양식 이미지 분류 Multimodal LDA Topic Model 다중 양식 명사 개체 인식 다중 양식 질의응답 Multimodal Recurrent Neural Network 다중 모달 RoBERTa 기반 분류 다중 양식 텍스트 요약 다중 양식 토픽 모델링 다중 모달 트랜스포머 멀티모달 비전 트랜스포머 다중모드 워드투벡터

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →

방법 전문 읽기

Method map

출처

이 페이지 인용 방법

관련 방법

Which method?

이 방법을 참조하는 항목