2601.02666v1 Jan 06, 2026 cs.AI

시간적으로 확장된 작업에서의 강화 학습 가속화를 위한 인과 그래프 시간 논리 공식 추론

Inferring Causal Graph Temporal Logic Formulas to Expedite Reinforcement Learning in Temporally Extended Tasks

Citations: 35

h-index: 3

Citations: 35

h-index: 4

의사 결정 작업은 종종 시공간적 역학을 지닌 그래프 상에서 전개된다. 블랙박스 강화 학습은 국소적인 변화가 네트워크 구조를 통해 어떻게 확산되는지를 간과하는 경우가 많아, 표본 효율성과 해석 가능성을 제한한다. 본 논문에서는 정책 학습과 인과 그래프 시간 논리(Causal GTL) 명세 마이닝을 동시에 수행하는 폐루프 프레임워크인 GTL-CIRL을 제안한다. 이 방법은 강건성을 바탕으로 보상을 형성하고, 효과가 실패할 경우 반례를 수집하며, 가우시안 프로세스(GP) 기반의 베이지안 최적화를 사용하여 매개변수화된 인과 템플릿을 정제한다. GP 모델은 시스템 역학의 공간적 및 시간적 상관관계를 포착하여 복잡한 매개변수 공간을 효율적으로 탐색할 수 있게 한다. 유전자 및 전력 네트워크에 대한 사례 연구 결과, 표준 RL 베이스라인에 비해 학습 속도가 빠르고 명확하며 검증 가능한 동작을 보이는 것으로 나타났다.

Original Abstract

Decision-making tasks often unfold on graphs with spatial-temporal dynamics. Black-box reinforcement learning often overlooks how local changes spread through network structure, limiting sample efficiency and interpretability. We present GTL-CIRL, a closed-loop framework that simultaneously learns policies and mines Causal Graph Temporal Logic (Causal GTL) specifications. The method shapes rewards with robustness, collects counterexamples when effects fail, and uses Gaussian Process (GP) driven Bayesian optimization to refine parameterized cause templates. The GP models capture spatial and temporal correlations in the system dynamics, enabling efficient exploration of complex parameter spaces. Case studies in gene and power networks show faster learning and clearer, verifiable behavior compared to standard RL baselines.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!