ScholarGate
Assistent
Machine learningDeep learning / NLP / CV

Multimodale setningsinnleiringer

Multimodale setningsinnleiringer avbilder tekst og bilder (og noen ganger lyd eller video) inn i et felles kontinuerlig vektorrom, slik at semantisk relaterte par fra forskjellige modaliteter havner nær hverandre. Disse representasjonene, trent med kontrastive mål på store parvise korpora, driver kryssmodal gjenfinning, nullskudds-klassifisering og syn-språk-resonnering.

Åpne i MethodMindSnartVideoSnartDownload slides

Les hele metoden

Kun for medlemmer

Logg inn med en gratis konto for å lese denne delen.

Logg inn

Method map

The neighbourhood of related methods — select a node to explore.

Kilder

  1. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), pp. 8748–8763. PMLR. link
  2. Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., & Mikolov, T. (2013). DeViSE: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 26. link

Slik siterer du denne siden

ScholarGate. (2026, June 3). Multimodal Sentence Embeddings (Joint Vision-Language Representation Learning). ScholarGate. https://scholargate.app/no/deep-learning/multimodal-sentence-embeddings

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

Referert av

ScholarGateMultimodal Sentence Embeddings (Multimodal Sentence Embeddings (Joint Vision-Language Representation Learning)). Hentet 2026-06-15 fra https://scholargate.app/no/deep-learning/multimodal-sentence-embeddings · Datasett: https://doi.org/10.5281/zenodo.20539026