Machine learningDeep learning / NLP / CV

다중 양식 질의응답

다중 양식 질의응답(Multimodal QA)은 텍스트와 이미지, 그리고 비디오, 오디오, 구조화된 표와 같이 여러 양식의 정보를 공동으로 추론하여 자연어 질문에 답하는 딥러닝 방법의 한 종류입니다. 2015년 VQA 벤치마크를 통해 두드러지게 소개된 이후, 문서 이해, 의료 진단 지원, 체화된 AI를 지원하는 광범위한 연구 분야로 확장되었습니다.

MethodMind에서 열기곧 제공동영상곧 제공Download slides

방법 전문 읽기

회원 전용

무료 계정으로 로그인하면 이 섹션을 읽을 수 있습니다.

로그인

Method map

The neighbourhood of related methods — select a node to explore.

다중 양식 질의응답

BERT 기반 분류 멀티모달 BERT 기반 분류 다중 양식 문장 임베딩 다중 양식 텍스트 요약 다중 모달 트랜스포머 다중 양식 명사 개체 인식

출처

Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). VQA: Visual Question Answering. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2425–2433. DOI: 10.1109/ICCV.2015.279 ↗
Xu, P., Zhu, X., & Clifton, D. A. (2023). Multimodal learning with transformers: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10), 12113–12132. DOI: 10.1109/TPAMI.2023.3275156 ↗

이 페이지 인용 방법

ScholarGate. (2026, June 3). Multimodal Question Answering (Cross-Modal QA). ScholarGate. https://scholargate.app/ko/deep-learning/multimodal-question-answering

Which method?

Set this method beside its closest kin and read them side by side — the library lays the books on the table; the choice is yours.

Compare side by side →

이 방법을 참조하는 항목

다중 양식 명사 개체 인식 다중 양식 텍스트 요약

이 페이지에서 오류를 발견하셨나요? 신고하거나 수정을 제안하세요 →