ScholarGate
助手

方法对比

并排查看您选择的方法;存在差异的行会高亮显示。

多模态门控循环单元 (Multimodal GRU)×多模态LSTM×
领域深度学习深度学习
方法族Machine learningMachine learning
起源年份2014–20172016
提出者Cho, K. et al. (GRU); adapted to multimodal settings by multiple research groupsRajagopalan et al. and various concurrent works (2016–2018)
类型Recurrent neural network (multimodal variant)Recurrent neural network architecture
开创性文献Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Proceedings of EMNLP 2014, 1724–1734. link ↗Rajagopalan, S., Tran, L., Rozgic, V., Narayanan, S., Kumar, A., & Ramakrishna, S. (2016). Extending Long Short-Term Memory for Multi-View Structured Learning. In Proceedings of ECCV 2016. Springer. link ↗
别名MM-GRU, Multimodal Gated Recurrent Unit, Cross-modal GRU, Multi-input GRUMM-LSTM, multimodal recurrent network, multi-input LSTM, multimodal sequence model
相关64
摘要Multimodal GRU extends the Gated Recurrent Unit architecture to jointly process sequential data from multiple input modalities — such as text, audio, and video frames — within a single recurrent framework. By fusing modality-specific encodings at the input or hidden-state level, it captures temporal dependencies across heterogeneous data streams and is widely used in multimodal sentiment analysis, video understanding, and audio-visual speech recognition.Multimodal LSTM extends the standard Long Short-Term Memory network to jointly process sequential data from multiple input modalities — such as text, audio, and video — within a unified recurrent architecture. By fusing representations from different sources before or within the LSTM cells, it captures temporal dependencies that span and cross modalities, making it a foundational approach for tasks like sentiment analysis, video captioning, and affective computing.
ScholarGate数据集
  1. v1
  2. 2 来源
  3. PUBLISHED
  1. v1
  2. 2 来源
  3. PUBLISHED

前往搜索 下载幻灯片

ScholarGate方法对比: Multimodal GRU · Multimodal LSTM. 于 2026-06-18 检索自 https://scholargate.app/zh/compare