2605.02819v1 May 04, 2026 cs.AI

SCPRM: 지식 그래프 질의 응답을 위한 스키마 기반 누적 과정 보상 모델

SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

Sihong Xie

Citations: 3

h-index: 1

Hui Xiong

Citations: 2

h-index: 1

Jiujiu Chen

Citations: 87

h-index: 4

Yazheng Liu

Citations: 239

h-index: 6

대규모 언어 모델은 복잡한 추론에 뛰어나지만, 중간 단계의 평가가 여전히 어렵습니다. 과정 보상 모델은 단계별 지침을 제공하지만, 종종 '위험 보상 효과'라는 문제를 겪습니다. 즉, 잘못된 단계가 이후의 올바른 단계로 상쇄되어, 결함이 있는 추론 경로에도 높은 보상이 부여될 수 있습니다. 이러한 문제는 지식 그래프(KG) 추론에서 더욱 심각합니다. 왜냐하면 KG 내에서 시작 엔터티와 종료 엔터티 사이에 여러 경로가 존재할 수 있으며, 위험한 단계 하나가 전체 추론 경로를 잘못되게 만들 수 있기 때문입니다. 이러한 한계는 의료 및 법률과 같은 위험 민감한 KG 추론 작업에서 문제가 됩니다. 이러한 문제점을 해결하기 위해, 우리는 추론 경로를 평가할 때 추론의 선행 부분에 조건을 부여하고, 쿼리에서 암시적으로 추출된 대상과 현재 추론 단계 간의 스키마 거리를 통합하는 스키마 기반 누적 과정 보상 모델(SCPRM)을 제안합니다. 이를 통해 누적적이고 미래 지향적인 보상을 제공하여 경로 탐색을 안내합니다. 또한, SCPRM을 몬테카를로 트리 탐색(MCTS)에 통합하여 SCPRM-MCTS를 구성하고, 지식 그래프에서 멀티홉 추론을 수행하여 질의 응답(QA) 작업을 수행합니다. 의료 및 법률 KGQA 및 CWQ에서 SCPRM-MCTS는 강력한 기준 모델에 비해 Hits@k 성능을 평균 1.18% 향상시켜, 더욱 정확하고 위험에 민감한 추론 평가를 가능하게 합니다.

Original Abstract

Large language models excel at complex reasoning, yet evaluating their intermediate steps remains challenging. Although process reward models provide step-wise supervision, they often suffer from a risk compensation effect, where incorrect steps are offset by later correct ones, assigning high rewards to flawed reasoning paths. This issue is further exacerbated in knowledge graph (KG) reasoning, as there may exist multiple paths between the start and end entities in the KGs, and a risky step can make the reasoning path flawed. Those limitations are problematic in risk-sensitive tasks such as medical and legal KG reasoning. To address the issues, we propose a Schema-aware Cumulative Process Reward Model (SCPRM) that evaluates reasoning paths by conditioning on the reasoning prefix , and incorporating schema distance between current reasoning step and the implicit target parsed from the query, which provides cumulative and future rewards to guide the path explorations. We further integrate SCPRM into Monte Carlo Tree Search (MCTS) as SCPRM-MCTS to conduct multi-hop reasoning on KGs for question answering (QA) tasks. Across medical and legal KGQA and CWQ, SCPRM-MCTS improves the performance of Hits@k by an average of 1.18% over strong baselines, demonstrating more accurate and risk-sensitive reasoning evaluation.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!