2603.21720v1 Mar 23, 2026 cs.CL

SemEval-2026 Task 12: 귀납적 사건 추론: 대규모 언어 모델을 위한 실세계 사건 인과 추론

SemEval-2026 Task 12: Abductive Event Reasoning: Towards Real-World Event Causal Inference for Large Language Models

Pengfei Cao

Institute of Automation, Chinese Academy of Sciences

Citations: 1,563

h-index: 20

Jun Zhao

Citations: 4

h-index: 1

Chenlong Zhang

Citations: 13

h-index: 2

Yubo Chen

Institute of Automation, Chinese Academy of Sciences

Citations: 4,657

h-index: 27

Ming Yang

Citations: 22

h-index: 2

Mingxuan Liu

University of Trento

Citations: 131

h-index: 5

Kang Liu

Citations: 14

h-index: 2

실세계 사건이 발생하는 이유를 이해하는 것은 자연어 처리와 실용적인 의사 결정 모두에 중요하지만, 증거가 풍부한 환경에서의 직접적인 원인 추론은 아직 충분히 연구되지 않았습니다. 이러한 간극을 해소하기 위해, 우리는 SemEval-2026 Task 12: 귀납적 사건 추론 (AER)을 조직했습니다. (참고: 작업 데이터는 https://github.com/sooo66/semeval2026-task12-dataset.git 에서 이용 가능합니다.) 이 작업은 시스템이 제시된 증거를 기반으로 대상 사건의 가장 가능성 있는 직접적인 원인을 식별하도록 요청합니다. 우리는 AER을 증거 기반의 객관식 벤치마크로 정의하여, 분산된 증거, 간접적인 배경 요인, 그리고 의미적으로 관련 있지만 인과 관계가 없는 오답 선택지 등, 실세계 인과 추론의 핵심적인 과제를 반영합니다. 이 공동 작업에는 122명의 참가자가 참여하여 518개의 제출물이 있었습니다. 본 논문에서는 작업의 구성, 데이터셋 구축 과정, 평가 설정, 그리고 시스템 결과를 제시합니다. AER은 실세계 사건에 대한 귀납적 추론을 위한 집중적인 벤치마크를 제공하며, 인과 추론 및 다중 문서 이해에 대한 미래 연구의 과제를 강조합니다.

Original Abstract

Understanding why real-world events occur is important for both natural language processing and practical decision-making, yet direct-cause inference remains underexplored in evidence-rich settings. To address this gap, we organized SemEval-2026 Task 12: Abductive Event Reasoning (AER).\footnote{The task data is available at https://github.com/sooo66/semeval2026-task12-dataset.git} The task asks systems to identify the most plausible direct cause of a target event from supporting evidence. We formulate AER as an evidence-grounded multiple-choice benchmark that captures key challenges of real-world causal reasoning, including distributed evidence, indirect background factors, and semantically related but non-causal distractors. The shared task attracted 122 participants and received 518 submissions. This paper presents the task formulation, dataset construction pipeline, evaluation setup, and system results. AER provides a focused benchmark for abductive reasoning over real-world events and highlights challenges for future work on causal reasoning and multi-document understanding.

0 Citations

0 Influential

33.5 Altmetric

167.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!