Multilingual semantic segmentation is a pixel-level scene parsing approach that assigns a semantic class label to every pixel in an image while incorporating cross-lingual capabilities — enabling a single model to recognise scene-text elements, annotations, or training signals drawn from multiple languages. It combines deep encoder-decoder architectures with multilingual language representations, making it applicable to documents, street signs, natural scene images, and medical imagery across diverse linguistic contexts.
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of ECCV 2018. link ↗