2601.01195v1 Jan 03, 2026 cs.AI

시간적 지식 질의응답을 위한 강화학습으로 향상된 멀티홉 추론

Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering

Wuzhenghong Wen

Citations: 32

h-index: 2

Su Pan

Citations: 3

h-index: 1

Yuwei Sun

Citations: 7

h-index: 2

Minlong Peng

Citations: 3

h-index: 1

Chao Xue

Citations: 59

h-index: 5

시간적 지식 그래프 질의응답(TKGQA)은 주어진 질문에 답하기 위해 지식 그래프 내에서 시간적 제약이 있는 개체 관계에 대한 멀티홉 추론을 수행하는 것을 포함한다. 그러나 각 홉(hop)에서 대형 언어 모델(LLM)은 시간적으로 유사하고 의미적으로 복잡한 관계를 다수 포함하는 하위 그래프를 검색하게 되며, 이는 최적이지 않은 결정과 오류 전파의 위험을 높인다. 이러한 문제를 해결하기 위해, 본 논문에서는 전방향 및 후방향 추론을 모두 강화하여 전역적으로 최적의 추론 경로 식별을 개선하는 멀티홉 추론 강화(MRE) 프레임워크를 제안한다. 구체적으로, MRE는 프롬프트 엔지니어링을 통해 LLM이 주어진 질문에 대해 다양한 추론 경로를 생성하도록 유도하는 것으로 시작한다. 이후 유효한 추론 경로를 선별하여 지도 미세 조정(supervised fine-tuning)에 사용하며, 이는 콜드 스타트 전략으로 기능한다. 마지막으로, 재귀적 트리 구조의 탐색 기반 학습 접근 방식인 트리-그룹 상대 정책 최적화(T-GRPO)를 도입한다. 각 홉에서 탐색은 이전 홉에 대해 강한 인과적 의존성을 확립하며, 평가는 후속 홉들로부터 얻은 다중 경로 탐색 피드백을 통해 이루어진다. 두 가지 TKGQA 벤치마크에 대한 실험 결과, 제안된 MRE 기반 모델이 복잡한 멀티홉 질의를 처리하는 데 있어 최첨단(SOTA) 접근 방식들을 일관되게 능가하는 것으로 나타났다. 추가 분석을 통해 향상된 해석 가능성과 노이즈가 있는 시간적 주석에 대한 견고성 또한 확인되었다.

Original Abstract

Temporal knowledge graph question answering (TKGQA) involves multi-hop reasoning over temporally constrained entity relationships in the knowledge graph to answer a given question. However, at each hop, large language models (LLMs) retrieve subgraphs with numerous temporally similar and semantically complex relations, increasing the risk of suboptimal decisions and error propagation. To address these challenges, we propose the multi-hop reasoning enhanced (MRE) framework, which enhances both forward and backward reasoning to improve the identification of globally optimal reasoning trajectories. Specifically, MRE begins with prompt engineering to guide the LLM in generating diverse reasoning trajectories for a given question. Valid reasoning trajectories are then selected for supervised fine-tuning, serving as a cold-start strategy. Finally, we introduce Tree-Group Relative Policy Optimization (T-GRPO), a recursive, tree-structured learning-by-exploration approach. At each hop, exploration establishes strong causal dependencies on the previous hop, while evaluation is informed by multi-path exploration feedback from subsequent hops. Experimental results on two TKGQA benchmarks indicate that the proposed MRE-based model consistently surpasses state-of-the-art (SOTA) approaches in handling complex multi-hop queries. Further analysis highlights improved interpretability and robustness to noisy temporal annotations.

2 Citations

0 Influential

2.5 Altmetric

14.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!