Multimodal Recurrent Neural Network
Et Multimodal Recurrent Neural Network kombinerer input fra to eller flere datamodaliteter — såsom billeder, tekst og lyd — inden for et rekurrent sekvensbehandlingsframework. Det koder hver modalitet separat, fusionerer repræsentationerne og behandler derefter det kombinerede signal gennem rekurrent enheder (RNN, LSTM eller GRU) for at generere eller klassificere sekventielle output. Dette design gjorde det til en fundamental tilgang inden for billedtekstning, videobeskrivelse og lyd-visuel talegenkendelse.
Læs hele metoden
Log ind med en gratis konto for at læse dette afsnit.
Method map
The neighbourhood of related methods — select a node to explore.
Kilder
- Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and Tell: A Neural Image Caption Generator. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164. DOI: 10.1109/CVPR.2015.7298935 ↗
- Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal Deep Learning. Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 689–696. link ↗
Sådan citerer du denne side
ScholarGate. (2026, June 3). Multimodal Recurrent Neural Network (MM-RNN). ScholarGate. https://scholargate.app/da/deep-learning/multimodal-recurrent-neural-network
Which method?
Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.
- Gated Recurrent Unit (GRU)Dyb læring↔ compare
- Long Short-Term Memory (LSTM)Dyb læring↔ compare
- Multimodal BERT-baseret klassifikationDyb læring↔ compare
- Multimodal Convolutional Neural NetworkDyb læring↔ compare
- Multimodal TransformerDyb læring↔ compare
- Recurrent Neural NetworkDyb læring↔ compare
Refereret af
Har du fundet en fejl på denne side? Indberet den eller foreslå en rettelse →