Machine learningDeep learning / NLP / CV

Višeslojni (multimodalni) ugrađeni prikazi rečenica

Višeslojni ugrađeni prikazi rečenica mapiraju tekst i slike (a ponekad i zvuk ili video) u zajednički kontinuirani vektorski prostor, tako da se semantički srodni parovi iz različitih modaliteta nađu blizu jedan drugoga. Obučeni pomoću kontrastivnih ciljeva na velikim uparenim korpusima, ovi prikazi pokreću unakrsno-modalno dohvaćanje, klasifikaciju bez primjera (zero-shot) i rezoniranje na relaciji vizija-jezik.

Otvorite u MethodMindUskoroVideoUskoroDownload slides

Pročitajte cijelu metodu

Samo za članove

Prijavite se besplatnim računom kako biste pročitali ovaj odjeljak.

Prijavite se

Method map

The neighbourhood of related methods — select a node to explore.

Izvori

  1. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), pp. 8748–8763. PMLR. link
  2. Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., & Mikolov, T. (2013). DeViSE: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 26. link

Kako citirati ovu stranicu

ScholarGate. (2026, June 3). Multimodal Sentence Embeddings (Joint Vision-Language Representation Learning). ScholarGate. https://scholargate.app/hr/deep-learning/multimodal-sentence-embeddings

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

Citirana u

ScholarGateMultimodal Sentence Embeddings (Multimodal Sentence Embeddings (Joint Vision-Language Representation Learning)). Preuzeto 2026-06-15 s https://scholargate.app/hr/deep-learning/multimodal-sentence-embeddings · Skup podataka: https://doi.org/10.5281/zenodo.20539026