Machine learningDeep learning / NLP / CV

다중 양식 이미지 분류

다중 양식 이미지 분류는 이미지 특징과 함께 텍스트 캡션, 오디오 또는 구조화된 메타데이터와 같은 추가 양식을 통합하여 표준 시각적 분류를 확장합니다. 별도의 인코더가 각 양식을 처리하고, 그 표현이 융합되며, 공동 분류기가 대상 레이블을 할당합니다. CLIP과 같은 모델은 이미지-텍스트 정렬이 대규모에서 제로샷 및 퓨샷 이미지 분류를 가능하게 함을 보여줍니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

다중 양식 이미지 분류

미세 조정된 이미지 분류 이미지 분류 멀티모달 BERT 기반 분류 다중 양식 객체 탐지 다중 양식 문장 임베딩 다중 모달 트랜스포머 다국어 이미지 분류

출처

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. Proceedings of the 38th International Conference on Machine Learning (ICML), PMLR 139, 8748–8763. link ↗
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. Proceedings of the 28th International Conference on Machine Learning (ICML), 689–696. link ↗

이 페이지 인용 방법

ScholarGate. (2026, June 3). Multimodal Image Classification (Vision + Auxiliary Modality Fusion). ScholarGate. https://scholargate.app/ko/deep-learning/multimodal-image-classification

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

이 방법을 참조하는 항목

다국어 이미지 분류 다중 양식 객체 탐지

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →

방법 전문 읽기

Method map

출처

이 페이지 인용 방법

관련 방법

Which method?

이 방법을 참조하는 항목