Machine learningDeep learning / NLP / CV

Multimodal Named Entity Recognition

Multimodal Named Entity Recognition (MNER) extends classical NER by fusing textual sequences with complementary modalities — most commonly images — to improve the identification and classification of named entities such as persons, organizations, and locations in settings where visual context disambiguates ambiguous or sparse text.

Open in MethodMindSoonVideoSoon

Read the full method

Members only

Sign in with a free account to read this section.

Sign in

Sources

  1. Moon, S., Neves, L., & Carvalho, V. (2018). Multimodal Named Entity Recognition for Short Social Media Posts. Proceedings of NAACL-HLT 2018, pp. 852–860. Association for Computational Linguistics. link
  2. Lu, D., Neves, L., Carvalho, V., Zhang, N., & Ji, H. (2018). Visual Attention Model for Name Tagging in Multimodal Social Media. Proceedings of ACL 2018, pp. 1990–1999. Association for Computational Linguistics. link

Related methods

ScholarGateMultimodal Named Entity Recognition (Multimodal Named Entity Recognition (Text + Visual/Auxiliary Modality NER)). Retrieved 2026-06-04 from https://scholargate.app/en/deep-learning/multimodal-named-entity-recognition