2602.21044v1 Feb 24, 2026 cs.AI

LogicGraph: 신경-기호적 생성 및 검증을 통한 다중 경로 논리 추론 벤치마킹

LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification

Yanrui Wu

Citations: 70

h-index: 3

Pengyu Li

Citations: 74

h-index: 5

Jun Liu

Citations: 154

h-index: 3

Lingling Zhang

Citations: 22

h-index: 3

Xinyu Zhang

Citations: 139

h-index: 5

Jiayu Chang

Citations: 114

h-index: 4

Xu Jiang

Citations: 42

h-index: 4

Jing-Tao Hu

Citations: 5

h-index: 1

대규모 언어 모델(LLM)의 평가는 주로 수렴적 논리 추론에 초점을 맞추며, 성공은 단일한 올바른 증명을 생성하는 것으로 정의됩니다. 그러나 많은 실제 추론 문제는 여러 개의 유효한 유도 과정을 가지며, 모델은 하나의 경로에만 집중하기보다는 다양한 논리적 경로를 탐색해야 합니다. 이러한 한계를 해결하기 위해, 우리는 신경-기호적 프레임워크를 활용하여 역방향 논리 생성과 의미적 인스턴스화를 통해 구축된, 다중 경로 논리 추론을 체계적으로 평가하는 최초의 벤치마크인 LogicGraph을 소개합니다. 이 파이프라인은 솔버에 의해 검증된 추론 문제를 생성하며, 각 문제는 깊이가 높고 다중 경로를 포함하며, 고유한 논리적 함정을 가지고 있으며, 각 인스턴스에 대해 완전한 최소 증명 집합이 연결되어 있습니다. 또한, 수렴적 및 발산적 환경 모두에서 모델의 성능을 엄격하게 평가하기 위한 참조가 없는 평가 프레임워크를 제안합니다. 최첨단 언어 모델에 대한 실험 결과, 모델은 종종 초기 단계에서 하나의 경로에 얽매이고 대안을 탐색하지 못하는 공통적인 한계를 보이는 것으로 나타났습니다. 또한, 추론 깊이가 깊어질수록 이러한 격차가 크게 증가합니다. LogicGraph은 이러한 격차를 드러내고 향후 개선을 위한 실행 가능한 통찰력을 제공합니다. 저희의 코드와 데이터는 https://github.com/kkkkarry/LogicGraph 에서 공개될 예정입니다.

Original Abstract

Evaluations of large language models (LLMs) primarily emphasize convergent logical reasoning, where success is defined by producing a single correct proof. However, many real-world reasoning problems admit multiple valid derivations, requiring models to explore diverse logical paths rather than committing to one route. To address this limitation, we introduce LogicGraph, the first benchmark aimed to systematically evaluate multi-path logical reasoning, constructed via a neuro-symbolic framework that leverages backward logic generation and semantic instantiation. This pipeline yields solver-verified reasoning problems formalized by high-depth multi-path reasoning and inherent logical distractions, where each instance is associated with an exhaustive set of minimal proofs. We further propose a reference-free evaluation framework to rigorously assess model performance in both convergent and divergent regimes. Experiments on state-of-the-art language models reveal a common limitation: models tend to commit early to a single route and fail to explore alternatives, and the coverage gap grows substantially with reasoning depth. LogicGraph exposes this divergence gap and provides actionable insights to motivate future improvements. Our code and data will be released at https://github.com/kkkkarry/LogicGraph.

0 Citations

0 Influential

25.9657359028 Altmetric

129.8 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!