ScholarGate
Trợ lý

So sánh phương pháp

Xem các phương pháp đã chọn cạnh nhau; những hàng khác biệt được làm nổi bật.

Transformer Đa phương thức×Nhúng câu (Sentence Embeddings)×
Lĩnh vựcHọc sâuHọc sâu
HọMachine learningMachine learning
Năm ra đời2019–20212015–2019
Người khởi xướngLu et al. (ViLBERT); Radford et al. (CLIP)Kiros et al. (Skip-Thought, 2015); Reimers & Gurevych (Sentence-BERT, 2019)
LoạiCross-modal attention-based deep learning modelRepresentation learning / embedding
Công trình gốcLu, J., Batra, D., Parikh, D., & Lee, S. (2019). ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. Advances in Neural Information Processing Systems (NeurIPS), 32. link ↗Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3980–3990. DOI ↗
Tên gọi khácmultimodal attention model, cross-modal transformer, vision-language transformer, multi-modal fusion transformersentence vectors, sentence representations, SBERT, semantic sentence encoding
Liên quan54
Tóm tắtA Multimodal Transformer extends the standard Transformer architecture to process and jointly reason over two or more input modalities — most commonly text and images, but also audio, video, or structured data. Cross-modal attention layers allow information from one modality to inform representations in another, enabling tasks such as visual question answering, image captioning, and multimodal sentiment analysis.Sentence Embeddings convert a sentence or short text into a single fixed-length dense vector that captures its semantic meaning. These vectors allow downstream tasks — semantic similarity, clustering, retrieval, and classification — to operate on numerical representations instead of raw text, making them one of the most versatile building blocks in modern NLP pipelines.
ScholarGateBộ dữ liệu
  1. v1
  2. 2 Nguồn tài liệu
  3. PUBLISHED
  1. v1
  2. 2 Nguồn tài liệu
  3. PUBLISHED

Đến trang tìm kiếm Tải xuống bản trình chiếu

ScholarGateSo sánh phương pháp: Multimodal Transformer · Sentence Embeddings. Truy cập ngày 2026-06-18 từ https://scholargate.app/vi/compare