Machine learningDeep learning / NLP / CV

Multimodal Recurrent Neural Network (MM-RNN)

비디오를 보면서 무슨 일이 일어나고 있는지 설명한다고 상상해 보세요. 단어 시퀀스를 생성하면서 동시에 시각 프레임과 오디오를 처리하는 것입니다. 다중 양식 RNN은 CNN으로 시각 스트림을 인코딩하고, 자체 인코더로 오디오 또는 텍스트를 인코딩한 다음, 둘 다를 단어 단위로 생성하는 순환 신경망에 공급함으로써 이를 모방합니다. RNN의 각 단계는 발전하는 은닉 상태(지금까지의 시퀀스에 대한 기억)와 융합된 다중 양식 컨텍스트에 접근할 수 있으므로, 생성된 출력은 모든 입력 양식과 동시에 일관성을 유지합니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

Multimodal Recurrent Neural Network

Gated Recurrent Unit (GR…Long Short-Term Memory (…멀티모달 BERT 기반 분류 다중 양식 합성곱 신경망 다중 모달 트랜스포머 순환 신경망 다중 모드 GRU

출처

Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and Tell: A Neural Image Caption Generator. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164. DOI: 10.1109/CVPR.2015.7298935 ↗
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal Deep Learning. Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 689–696. link ↗

이 페이지 인용 방법

ScholarGate. (2026, June 3). Multimodal Recurrent Neural Network (MM-RNN). ScholarGate. https://scholargate.app/ko/deep-learning/multimodal-recurrent-neural-network

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Gated Recurrent Unit (GRU)딥러닝↔ compare
Long Short-Term Memory (LSTM)딥러닝↔ compare
멀티모달 BERT 기반 분류딥러닝↔ compare
다중 양식 합성곱 신경망딥러닝↔ compare
다중 모달 트랜스포머딥러닝↔ compare
순환 신경망딥러닝↔ compare

Compare side by side →

이 방법을 참조하는 항목

다중 양식 합성곱 신경망 다중 모드 GRU

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →