Machine learning

CLIP — Contrastive Language-Image Pretraining

CLIP (Contrastive Language-Image Pretraining)은 OpenAI의 Radford 등이 2021년에 소개한 비전-언어 모델로, 4억 개의 인터넷 기반 이미지-텍스트 쌍을 이용한 대조 학습 목표를 통해 정렬된 이미지 및 텍스트 표현을 공동으로 학습합니다. 이를 통해 특정 작업에 대한 미세 조정 없이도 제로샷(zero-shot) 전이 학습을 이미지 분류 작업에 적용할 수 있습니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

CLIP

ResNet (Residual Network)Vision Transformer 멀티모달 BERT 기반 분류 다중 양식 문장 임베딩

출처

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 8748–8763. link ↗
Radford, A., et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020. link ↗
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. ISBN: 978-0-262-03561-3

이 페이지 인용 방법

ScholarGate. (2026, June 3). Contrastive Language-Image Pretraining. ScholarGate. https://scholargate.app/ko/deep-learning/clip

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

ResNet (Residual Network)딥러닝↔ compare
Vision Transformer딥러닝↔ compare

Compare side by side →

이 방법을 참조하는 항목

멀티모달 BERT 기반 분류 다중 양식 문장 임베딩

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →