ScholarGate
アシスタント

手法を比較

選択した手法を並べて確認できます。異なる行はハイライト表示されます。

マルチモーダル物体検出×マルチモーダル・トランスフォーマー×
分野深層学習深層学習
系統Machine learningMachine learning
提唱年2015–20192019–2021
提唱者Multiple contributors (e.g., Chen & Deng, Liang et al.)Lu et al. (ViLBERT); Radford et al. (CLIP)
種類Fusion-based deep detectionCross-modal attention-based deep learning model
原典Liu, Y., Zhang, F., Li, Y., & Lv, H. (2022). Multimodal Object Detection via Bayesian Fusion. IEEE Transactions on Image Processing, 31, 5953–5965. link ↗Lu, J., Batra, D., Parikh, D., & Lee, S. (2019). ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. Advances in Neural Information Processing Systems (NeurIPS), 32. link ↗
別名multi-sensor object detection, cross-modal detection, RGB-D object detection, fusion-based object detectionmultimodal attention model, cross-modal transformer, vision-language transformer, multi-modal fusion transformer
関連65
概要Multimodal object detection extends single-modality object detectors by jointly processing signals from multiple sensor types — such as RGB cameras, depth sensors, LiDAR, radar, or text descriptions — to localize and classify objects with higher accuracy and robustness than any single modality alone. Fusion of complementary information is the core design principle.A Multimodal Transformer extends the standard Transformer architecture to process and jointly reason over two or more input modalities — most commonly text and images, but also audio, video, or structured data. Cross-modal attention layers allow information from one modality to inform representations in another, enabling tasks such as visual question answering, image captioning, and multimodal sentiment analysis.
ScholarGateデータセット
  1. v1
  2. 2 出典
  3. PUBLISHED
  1. v1
  2. 2 出典
  3. PUBLISHED

検索へ スライドをダウンロード

ScholarGate手法を比較: Multimodal Object Detection · Multimodal Transformer. 2026-06-17に以下より取得 https://scholargate.app/ja/compare