2601.18296v1 Jan 26, 2026 cs.CL

Temp-R1: 역방향 커리큘럼 강화 학습을 이용한 복잡한 시간적 지식 그래프 질의 응답을 위한 통합 자율 에이전트

Temp-R1: A Unified Autonomous Agent for Complex Temporal KGQA via Reverse Curriculum Reinforcement Learning

Xinle Deng

Citations: 141

h-index: 6

Zhaoyan Gong

Citations: 26

h-index: 3

Zhiqiang Liu

Citations: 62

h-index: 5

Songze Li

Citations: 30

h-index: 3

Xiaoke Guo

Citations: 22

h-index: 3

Yuanxiang Liu

Citations: 11

h-index: 2

Zhizhen Liu

Citations: 45

h-index: 3

Lei Liang

Citations: 104

h-index: 5

Hua-zeng Chen

Citations: 1,507

h-index: 23

Wen Zhang

Citations: 35

h-index: 3

시간적 지식 그래프 질의 응답(TKGQA)은 본질적으로 어려운 문제이며, 이는 다중 단계 의존성과 복잡한 시간적 제약을 가진 동적 사실에 대한 정교한 추론을 요구하기 때문입니다. 기존 방법은 고정된 워크플로우와 비용이 많이 드는 독점 API에 의존하여 유연성과 확장성을 제한합니다. 우리는 강화 학습을 통해 훈련된 최초의 완전 자율 TKGQA 에이전트인 Temp-R1을 제안합니다. 단일 액션 기반 추론에서 발생하는 인지 과부하를 해결하기 위해, 우리는 외부 액션과 함께 특수한 내부 액션을 사용하여 액션 공간을 확장합니다. 간단한 질문에 대한 단순화된 학습을 방지하기 위해, 우리는 어려운 질문부터 훈련하는 역방향 커리큘럼 학습을 도입하여, 더 쉬운 경우로 이전하기 전에 정교한 추론 능력을 개발하도록 합니다. 80억 개의 파라미터를 가진 Temp-R1은 MultiTQ 및 TimelineKGQA에서 최첨단 성능을 달성했으며, 복잡한 질문에서 강력한 기준 모델보다 19.8% 향상되었습니다. 우리의 연구는 자율적인 시간적 추론 에이전트에 대한 새로운 패러다임을 제시합니다. 저희 코드는 곧 다음 주소에서 공개될 예정입니다: https://github.com/zjukg/Temp-R1.

Original Abstract

Temporal Knowledge Graph Question Answering (TKGQA) is inherently challenging, as it requires sophisticated reasoning over dynamic facts with multi-hop dependencies and complex temporal constraints. Existing methods rely on fixed workflows and expensive closed-source APIs, limiting flexibility and scalability. We propose Temp-R1, the first autonomous end-to-end agent for TKGQA trained through reinforcement learning. To address cognitive overload in single-action reasoning, we expand the action space with specialized internal actions alongside external action. To prevent shortcut learning on simple questions, we introduce reverse curriculum learning that trains on difficult questions first, forcing the development of sophisticated reasoning before transferring to easier cases. Our 8B-parameter Temp-R1 achieves state-of-the-art performance on MultiTQ and TimelineKGQA, improving 19.8% over strong baselines on complex questions. Our work establishes a new paradigm for autonomous temporal reasoning agents. Our code will be publicly available soon at https://github.com/zjukg/Temp-R1.

7 Citations

1 Influential

38.431471805599 Altmetric

201.2 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!