ScholarGate
Asistent
Machine learningDeep learning / NLP / CV

Multimodalne ugrađene rečenice

Multimodalne ugrađene rečenice mapiraju tekst i slike (a ponekad i zvuk ili video) u zajednički kontinuirani vektorski prostor, tako da se semantički srodni parovi iz različitih modaliteta nalaze blizu. Obučene kontrastivnim ciljevima na velikim uparenim korpusima, ove reprezentacije pokreću unakrsno-modalno pretraživanje, klasifikaciju nultog snimka i rezonovanje jezik-vid.

Otvorite u MethodMindUskoroVideoUskoroDownload slides

Pročitajte celu metodu

Samo za članove

Prijavite se besplatnim nalogom da biste pročitali ovaj odeljak.

Prijavite se

Method map

The neighbourhood of related methods — select a node to explore.

Izvori

  1. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), pp. 8748–8763. PMLR. link
  2. Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., & Mikolov, T. (2013). DeViSE: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 26. link

Kako citirati ovu stranicu

ScholarGate. (2026, June 3). Multimodal Sentence Embeddings (Joint Vision-Language Representation Learning). ScholarGate. https://scholargate.app/sr/deep-learning/multimodal-sentence-embeddings

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

Citirana u

ScholarGateMultimodal Sentence Embeddings (Multimodal Sentence Embeddings (Joint Vision-Language Representation Learning)). Preuzeto 2026-06-15 sa https://scholargate.app/sr/deep-learning/multimodal-sentence-embeddings · Skup podataka: https://doi.org/10.5281/zenodo.20539026