2602.06939v1 Feb 06, 2026 cs.LG

마르코프 동역학을 넘어서는 학습을 위한 시간차(Temporal-Difference) 신호에 대한 코체인 관점

Cochain Perspectives on Temporal-Difference Signals for Learning Beyond Markov Dynamics

Sizhe Tang

Citations: 11

h-index: 2

Zuyuan Zhang

Citations: 85

h-index: 7

Tian Lan

Citations: 32

h-index: 3

장거리 의존성, 부분 관찰성, 그리고 기억 효과로 인해 실세계 환경에서 마르코프 동역학이 아닌 경우가 흔히 나타납니다. 강화 학습(RL)의 핵심인 벨만 방정식은 마르코프 동역학이 아닌 경우에는 근사적으로만 유효합니다. 기존 연구는 종종 실용적인 알고리즘 설계에 집중하며, 벨만 프레임워크가 실제로 어떤 동역학을 포착할 수 있는지, 그리고 최적 근사를 통해 새로운 알고리즘 클래스를 어떻게 개발할 수 있는지와 같은 중요한 질문에 대한 이론적인 분석은 부족합니다. 본 논문에서는 시간차(TD) 기반 RL에 대한 새로운 위상수학적 관점을 제시합니다. 우리는 TD 오차를 상태 전이의 위상 공간에서 1-코체인으로 볼 수 있으며, 마르코프 동역학은 위상적 적분 가능성으로 해석될 수 있음을 보여줍니다. 이러한 새로운 관점을 통해 벨만-드 람 투영을 사용하여 TD 오차를 적분 가능한 구성 요소와 위상적 잔여 부분으로 분해할 수 있습니다. 또한, 강화 학습에서 비-적분 가능한 투영 잔류를 최소화하도록 잠재 신경망을 학습하여 안정성/민감성 보장을 달성하는 HodgeFlow Policy Search (HFPS)를 제안합니다. 수치적 평가 결과, HFPS는 비-마르코프 환경에서 강화 학습 성능을 크게 향상시키는 것으로 나타났습니다.

Original Abstract

Non-Markovian dynamics are commonly found in real-world environments due to long-range dependencies, partial observability, and memory effects. The Bellman equation that is the central pillar of Reinforcement learning (RL) becomes only approximately valid under Non-Markovian. Existing work often focus on practical algorithm designs and offer limited theoretical treatment to address key questions, such as what dynamics are indeed capturable by the Bellman framework and how to inspire new algorithm classes with optimal approximations. In this paper, we present a novel topological viewpoint on temporal-difference (TD) based RL. We show that TD errors can be viewed as 1-cochain in the topological space of state transitions, while Markov dynamics are then interpreted as topological integrability. This novel view enables us to obtain a Hodge-type decomposition of TD errors into an integrable component and a topological residual, through a Bellman-de Rham projection. We further propose HodgeFlow Policy Search (HFPS) by fitting a potential network to minimize the non-integrable projection residual in RL, achieving stability/sensitivity guarantees. In numerical evaluations, HFPS is shown to significantly improve RL performance under non-Markovian.

1 Citations

0 Influential

3.5 Altmetric

18.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!