ScholarGate
المساعد

قارن الطرق

راجع الطرق التي اخترتها جنبًا إلى جنب؛ الصفوف المختلفة مميَّزة.

المحولات متعددة الوسائط (Multimodal Transformers)×تضمينات الجمل×
المجالالتعلم العميقالتعلم العميق
العائلةMachine learningMachine learning
سنة النشأة2019–20212015–2019
صاحب الطريقةLu et al. (ViLBERT); Radford et al. (CLIP)Kiros et al. (Skip-Thought, 2015); Reimers & Gurevych (Sentence-BERT, 2019)
النوعCross-modal attention-based deep learning modelRepresentation learning / embedding
المصدر التأسيسيLu, J., Batra, D., Parikh, D., & Lee, S. (2019). ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. Advances in Neural Information Processing Systems (NeurIPS), 32. link ↗Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3980–3990. DOI ↗
الأسماء البديلةmultimodal attention model, cross-modal transformer, vision-language transformer, multi-modal fusion transformersentence vectors, sentence representations, SBERT, semantic sentence encoding
ذات صلة54
الملخصA Multimodal Transformer extends the standard Transformer architecture to process and jointly reason over two or more input modalities — most commonly text and images, but also audio, video, or structured data. Cross-modal attention layers allow information from one modality to inform representations in another, enabling tasks such as visual question answering, image captioning, and multimodal sentiment analysis.Sentence Embeddings convert a sentence or short text into a single fixed-length dense vector that captures its semantic meaning. These vectors allow downstream tasks — semantic similarity, clustering, retrieval, and classification — to operate on numerical representations instead of raw text, making them one of the most versatile building blocks in modern NLP pipelines.
ScholarGateمجموعة البيانات
  1. v1
  2. 2 المصادر
  3. PUBLISHED
  1. v1
  2. 2 المصادر
  3. PUBLISHED

انتقل إلى البحث تنزيل الشرائح

ScholarGateقارن الطرق: Multimodal Transformer · Sentence Embeddings. استُرجع بتاريخ 2026-06-18 من https://scholargate.app/ar/compare