2602.05818v2 Feb 05, 2026 cs.AI

TKG-Thinker: 시계열 지식 그래프를 활용한 동적 추론을 위한 에이전트 기반 강화 학습

TKG-Thinker: Towards Dynamic Reasoning over Temporal Knowledge Graphs via Agentic Reinforcement Learning

Wei Wang

Citations: 0

h-index: 0

Minlong Peng

Citations: 3

h-index: 1

Zihao Jiang

Citations: 83

h-index: 4

Miao Peng

Wuhan University

Citations: 107

h-index: 5

Zhenyan Shan

Citations: 5

h-index: 1

Ben Liu

Citations: 137

h-index: 6

Wenjie Xu

Citations: 127

h-index: 6

Gong Chen

Citations: 4

h-index: 1

시계열 지식 그래프 질의 응답(TKGQA)은 시간 정보를 활용하여 시간 민감한 질문에 답변하는 것을 목표로 합니다. 대규모 언어 모델(LLM)은 TKGQA에서 상당한 잠재력을 보여주지만, 현재의 프롬프트 전략은 다음과 같은 두 가지 주요 측면에서 효율성을 제한합니다. 첫째, 복잡한 시간 제약 조건 하에서 추론 오류(hallucination)가 발생하기 쉽습니다. 둘째, 정적인 프롬프트는 모델의 자율성과 일반화 능력을 제한하며, 이는 시계열 지식 그래프(TKG) 환경과의 동적 상호 작용을 통한 최적화가 부족하기 때문입니다. 이러한 제한 사항을 해결하기 위해, 우리는 시계열 지식 그래프에 대한 추론을 위한 자율적인 계획 및 적응형 검색 기능을 갖춘 새로운 에이전트인 **TKG-Thinker**를 제안합니다. 구체적으로, TKG-Thinker는 이중 훈련 전략을 통해 시계열 지식 그래프와 심층적인 상호 작용을 수행하며 동적인 다단계 추론을 수행합니다. 먼저, 사고 과정(chain of thought) 데이터로 지도 학습(SFT)을 적용하여 핵심적인 계획 능력을 부여한 다음, 다차원적인 보상을 활용하여 복잡한 시간 제약 조건 하에서 추론 정책을 개선하는 강화 학습(RL) 단계를 거칩니다. 벤치마크 데이터 세트에서 세 가지 오픈 소스 LLM을 사용하여 수행한 실험 결과, TKG-Thinker는 최첨단 성능을 달성했으며, 복잡한 TKGQA 환경에서 강력한 일반화 능력을 보여주었습니다.

Original Abstract

Temporal knowledge graph question answering (TKGQA) aims to answer time-sensitive questions by leveraging temporal knowledge bases. While Large Language Models (LLMs) demonstrate significant potential in TKGQA, current prompting strategies constrain their efficacy in two primary ways. First, they are prone to reasoning hallucinations under complex temporal constraints. Second, static prompting limits model autonomy and generalization, as it lack optimization through dynamic interaction with temporal knowledge graphs (TKGs) environments. To address these limitations, we propose \textbf{TKG-Thinker}, a novel agent equipped with autonomous planning and adaptive retrieval capabilities for reasoning over TKGs. Specifically, TKG-Thinker performs in-depth temporal reasoning through dynamic multi-turn interactions with TKGs via a dual-training strategy. We first apply Supervised Fine-Tuning (SFT) with chain of thought data to instill core planning capabilities, followed by a Reinforcement Learning (RL) stage that leverages multi-dimensional rewards to refine reasoning policies under intricate temporal constraints. Experimental results on benchmark datasets with three open-source LLMs show that TKG-Thinker achieves state-of-the-art performance and exhibits strong generalization across complex TKGQA settings.

1 Citations

1 Influential

3 Altmetric

18.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!