2602.09207v1 Feb 09, 2026 cs.LG

CausalGDP: 인과 관계 기반 확산 정책을 활용한 강화 학습

CausalGDP: Causality-Guided Diffusion Policies for Reinforcement Learning

Xiao Hu

Citations: 0

h-index: 0

Yang Ye

Citations: 0

h-index: 0

Xubo Yue

Citations: 10

h-index: 2

Xiaofeng Xiao

Citations: 39

h-index: 3

강화 학습(RL)은 다양한 순차적 의사 결정 문제에서 놀라운 성공을 거두었습니다. 최근의 확산 기반 정책은 복잡하고 고차원적인 행동 분포를 모델링하여 강화 학습을 더욱 발전시킵니다. 그러나 기존의 확산 기반 정책은 주로 통계적 연관성에 의존하며, 상태, 행동, 보상 간의 인과 관계를 명시적으로 고려하지 못하여, 어떤 행동 요소가 실제로 높은 보상을 유발하는지 파악하는 데 한계를 보입니다. 본 논문에서는 인과 관계 기반 확산 정책(CausalGDP)을 제안합니다. CausalGDP는 강화 학습에 인과 추론을 통합하는 통합 프레임워크입니다. CausalGDP는 먼저 오프라인 데이터를 사용하여 기본 확산 정책과 초기 인과 동적 모델을 학습하여 상태, 행동 및 보상 간의 인과적 의존성을 파악합니다. 실시간 상호 작용 중에 인과 정보는 지속적으로 업데이트되어 확산 프로세스를 안내하는 신호로 사용되어 미래 상태와 보상에 인과적으로 영향을 미치는 행동으로 이어지도록 합니다. CausalGDP는 연관성 외에 인과 관계를 명시적으로 고려함으로써 정책 최적화를 실제로 성능 향상을 유도하는 행동 요소에 집중합니다. 실험 결과는 CausalGDP가 최첨단 확산 기반 및 오프라인 강화 학습 방법에 비해 경쟁력 있거나 우수한 성능을 지속적으로 달성하며, 특히 복잡하고 고차원적인 제어 작업에서 이를 입증합니다.

Original Abstract

Reinforcement learning (RL) has achieved remarkable success in a wide range of sequential decision-making problems. Recent diffusion-based policies further improve RL by modeling complex, high-dimensional action distributions. However, existing diffusion policies primarily rely on statistical associations and fail to explicitly account for causal relationships among states, actions, and rewards, limiting their ability to identify which action components truly cause high returns. In this paper, we propose Causality-guided Diffusion Policy (CausalGDP), a unified framework that integrates causal reasoning into diffusion-based RL. CausalGDP first learns a base diffusion policy and an initial causal dynamical model from offline data, capturing causal dependencies among states, actions, and rewards. During real-time interaction, the causal information is continuously updated and incorporated as a guidance signal to steer the diffusion process toward actions that causally influence future states and rewards. By explicitly considering causality beyond association, CausalGDP focuses policy optimization on action components that genuinely drive performance improvements. Experimental results demonstrate that CausalGDP consistently achieves competitive or superior performance over state-of-the-art diffusion-based and offline RL methods, especially in complex, high-dimensional control tasks.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!