ScholarGate
دستیار

مقایسهٔ روش‌ها

روش‌های انتخابی خود را کنار هم مرور کنید؛ ردیف‌های متفاوت برجسته شده‌اند.

CLIP×ResNet (شبکه باقی‌مانده)×
حوزهیادگیری عمیقیادگیری عمیق
خانوادهMachine learningMachine learning
سال پیدایش20212016
پدیدآورRadford, A.; Kim, J. W.; et al. (OpenAI)He, K.; Zhang, X.; Ren, S.; Sun, J.
نوعContrastive vision-language pretraining modelDeep Convolutional Neural Network with skip connections
منبع بنیادینRadford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 8748–8763. link ↗He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. DOI ↗
نام‌های دیگرCLIP, Contrastive Language-Image Pre-training, zero-shot image classifier, visual-language modelResNet, Residual Network, Deep Residual Learning, ResNet-50
مرتبط24
خلاصهCLIP (Contrastive Language-Image Pretraining) is a vision-language model introduced by Radford et al. at OpenAI in 2021 that jointly learns aligned image and text representations by training on 400 million internet-sourced image-text pairs using a contrastive objective, enabling zero-shot transfer to image classification tasks without any task-specific fine-tuning.ResNet (Residual Network) is a deep convolutional neural network architecture introduced by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun at CVPR 2016. By inserting shortcut (skip) connections that carry the input of a block directly to its output — defining the block's task as learning a residual correction rather than a full mapping — ResNet enabled training of networks with hundreds or even thousands of layers without the vanishing-gradient degradation that had previously made very deep networks impractical. It won the ILSVRC 2015 image recognition competition with a top-5 error of 3.57% and remains the most widely used backbone architecture in computer vision.
ScholarGateمجموعه‌داده
  1. v1
  2. 3 منابع
  3. PUBLISHED
  1. v1
  2. 3 منابع
  3. PUBLISHED

رفتن به جست‌وجو دریافت اسلایدها

ScholarGateمقایسهٔ روش‌ها: CLIP · ResNet. بازیابی‌شده در 2026-06-17 از https://scholargate.app/fa/compare