ScholarGate
助手
Machine learningDeep learning / NLP / CV

多模态句子嵌入

多模态句子嵌入将文本和图像(有时也包括音频或视频)映射到一个共享的连续向量空间中,使得来自不同模态的语义相关对彼此靠近。通过在大型配对语料库上进行对比目标训练,这些表示支持跨模态检索、零样本分类和视觉-语言推理。

在 MethodMind 中打开即将推出视频即将推出Download slides

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

登录

Method map

The neighbourhood of related methods — select a node to explore.

来源

  1. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), pp. 8748–8763. PMLR. link
  2. Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., & Mikolov, T. (2013). DeViSE: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 26. link

如何引用本页

ScholarGate. (2026, June 3). Multimodal Sentence Embeddings (Joint Vision-Language Representation Learning). ScholarGate. https://scholargate.app/zh/deep-learning/multimodal-sentence-embeddings

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

被引用于

ScholarGateMultimodal Sentence Embeddings (Multimodal Sentence Embeddings (Joint Vision-Language Representation Learning)). 于 2026-06-15 检索自 https://scholargate.app/zh/deep-learning/multimodal-sentence-embeddings · 数据集: https://doi.org/10.5281/zenodo.20539026