2603.01055v1 Mar 01, 2026 cs.AI

MMCOMET: 문맥 추론을 위한 대규모 다중 모드 상식 지식 그래프

MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning

D. Pratama

Citations: 2

h-index: 1

Caren Han

Citations: 19

h-index: 3

E. Wang

Citations: 26

h-index: 2

Hiba Arnaout

Max Planck Institute for Informatics

Citations: 185

h-index: 8

Shuo Yang

Citations: 6

h-index: 1

Dan Liu

Citations: 36

h-index: 4

Jie Yang

Citations: 22

h-index: 2

J. Poon

Citations: 20

h-index: 2

Jeff Z. Pan

Citations: 1

h-index: 1

본 논문에서는 물리적, 사회적, 사건 관련 지식을 통합한 최초의 다중 모드 상식 지식 그래프(MMKG)인 MMCOMET을 제시합니다. MMCOMET은 효율적인 이미지 검색 과정을 통해 ATOMIC2020 지식 그래프를 확장하여 시각적 차원을 포함시켰으며, 그 결과 90만 개 이상의 다중 모드 트리플을 구축했습니다. 이 새로운 리소스는 기존의 MMKG가 가진 한계를 극복하여 이미지 캡셔닝 및 스토리텔링과 같은 복잡한 추론 작업을 지원합니다. 표준적인 시각적 스토리텔링 실험을 통해, 우리의 통합적인 접근 방식이 텍스트만으로 생성된 이야기보다 더욱 풍부하고 일관적이며 문맥적으로 의미 있는 이야기를 생성할 수 있음을 보여줍니다. 본 리소스는 다중 모드 상식 추론 및 내러티브 생성의 새로운 기반을 마련합니다.

Original Abstract

We present MMCOMET, the first multimodal commonsense knowledge graph (MMKG) that integrates physical, social, and eventive knowledge. MMCOMET extends the ATOMIC2020 knowledge graph to include a visual dimension, through an efficient image retrieval process, resulting in over 900K multimodal triples. This new resource addresses a major limitation of existing MMKGs in supporting complex reasoning tasks like image captioning and storytelling. Through a standard visual storytelling experiment, we show that our holistic approach enables the generation of richer, coherent, and contextually grounded stories than those produced using text-only knowledge. This resource establishes a new foundation for multimodal commonsense reasoning and narrative generation.

1 Citations

0 Influential

4 Altmetric

21.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!