ScholarGate
助手
Machine learningDeep learning / NLP / CV

多模态文本摘要

多模态文本摘要通过深度学习模型联合处理多种输入模态(最常见的是文本和图像,但也包括视频帧或音频),并对视觉和语言表示进行对齐,从而生成简洁的文本摘要。输出的自然语言摘要能够捕捉所有可用模态的显著内容。

在 MethodMind 中打开即将推出视频即将推出Download slides

阅读完整方法

仅限会员

使用免费账户登录即可阅读本节。

登录

Method map

The neighbourhood of related methods — select a node to explore.

来源

  1. Zhu, J., Li, H., Liu, T., Zhou, Y., Zhang, J., & Zong, C. (2018). MSMO: Multimodal Summarization with Multimodal Output. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4154–4164. link
  2. Zhu, J., Zhou, Y., Zhang, J., Li, H., Zong, C., & Li, C. (2020). Multimodal Summarization with Guidance of Multimodal Reference. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 9749–9756. link

如何引用本页

ScholarGate. (2026, June 3). Multimodal Text Summarization (Cross-Modal Abstractive and Extractive Summarization). ScholarGate. https://scholargate.app/zh/deep-learning/multimodal-text-summarization

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side

被引用于

ScholarGateMultimodal Text Summarization (Multimodal Text Summarization (Cross-Modal Abstractive and Extractive Summarization)). 于 2026-06-15 检索自 https://scholargate.app/zh/deep-learning/multimodal-text-summarization · 数据集: https://doi.org/10.5281/zenodo.20539026