Machine learningDeep learning / NLP / CV

Multimodal BERT-baserad klassificering

Multimodal BERT-baserad klassificering utökar BERT-transformatorarkitekturen för att gemensamt koda och klassificera data från flera modaliteter – oftast text parad med bilder – genom att smälta samman deras representationer före ett slutligt klassifikationshuvud. Modellen introducerades framträdande runt 2019 genom modeller som MMBT och ViLBERT och har blivit en standardmetod för uppgifter där varken text eller bild ensamt bär tillräcklig information för korrekt etikettering.

Öppna i MethodMindSnartVideoSnartDownload slides

Läs hela metoden

Endast för medlemmar

Logga in med ett kostnadsfritt konto för att läsa avsnittet.

Logga in

Method map

The neighbourhood of related methods — select a node to explore.

Multimodal BERT-baserad klassificering

CLIP Vision Transformer Multimodal Convolutional…Multimodal Diffusionsmod…Multimodal Doc2Vec Multimodal grafnätverk Multimodal GRU Multimodal bildklassific…Multimodal LDA-ämnesmode…Multimodal Named Entity…

+8 more

Källor

Kiela, D., Bhooshan, S., Firooz, H., Perez, E., & Testuggine, D. (2019). Supervised multimodal bitransformers for classifying images and text. arXiv preprint arXiv:1909.02950. link ↗
Lu, J., Batra, D., Parikh, D., & Lee, S. (2019). ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in Neural Information Processing Systems, 32. link ↗

Så citerar du den här sidan

ScholarGate. (2026, June 3). Multimodal BERT-based Classification (Transformer Fusion of Text and Non-text Modalities). ScholarGate. https://scholargate.app/sv/deep-learning/multimodal-bert-based-classification

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

CLIPDjupinlärning↔ compare
Vision TransformerDjupinlärning↔ compare

Compare side by side →

Refereras av

Multimodal Convolutional Neural Network Multimodal Diffusionsmodell Multimodal Doc2Vec Multimodal grafnätverk Multimodal GRU Multimodal bildklassificering Multimodal LDA-ämnesmodell Multimodal Named Entity Recognition Multimodal Question Answering Multimodal Recurrent Neural Network Multimodal RoBERTa-baserad klassificering Multimodal textsammanfattning Multimodal Topic Modeling Multimodal Transformer Multimodal Vision Transformer Multimodal Word2Vec

Hittade du ett fel på sidan? Rapportera eller föreslå en rättelse →

Läs hela metoden

Method map

Källor

Så citerar du den här sidan

Närliggande metoder

Which method?

Refereras av