2603.13707v2 Mar 14, 2026 cs.RO

REFINE-DP: 강화 학습을 이용한 확산 정책 미세 조정 - 휴머노이드 로코-조작

REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

Zhaoyuan Gu

Citations: 269

h-index: 7

Yipu Chen

Georgia Institute of Technology

Citations: 40

h-index: 2

Zimeng Chai

Citations: 2

h-index: 1

A. Cueva

Citations: 15

h-index: 2

Thong Q. Nguyen

Citations: 19

h-index: 2

Yifan Wu

Citations: 6

h-index: 2

Minji Kim

Citations: 31

h-index: 2

Isaac Legene

Citations: 2

h-index: 1

Fukang Liu

Citations: 87

h-index: 3

Matthew Kim

Citations: 43

h-index: 3

Ayan Barula

Citations: 2

h-index: 1

Yongxin Chen

Citations: 173

h-index: 5

Ye Zhao

Citations: 27

h-index: 3

Hui Xue

Citations: 44

h-index: 4

휴머노이드 로코-조작은 복잡한 로봇-환경 동역학과 장기적인 작업 환경에서 안정적이고 낮은 수준의 전신 동작을 통해 조화로운 고수준 동작 계획을 필요로 합니다. 확산 정책(DP)은 시연 데이터를 통해 학습하는 데 유망하지만, 이를 휴머노이드 로봇에 적용하는 데에는 심각한 문제가 있습니다. 오프라인에서 훈련된 동작 계획기는 저수준 제어기와 분리되어 있어 명령 추적이 어렵고, 데이터 분포의 변화가 심화되어 작업 실패로 이어질 수 있습니다. 시연 데이터의 크기를 늘리는 일반적인 방법은 고차원 휴머노이드 시스템에서는 비용이 너무 많이 듭니다. 이러한 문제를 해결하기 위해, 우리는 강화 학습을 통해 확산 정책을 미세 조정하는 계층적 프레임워크인 REFINE-DP (REinforcement learning FINE-tuning of Diffusion Policy)를 제안합니다. DP는 PPO 기반의 확산 정책 경사 방법을 통해 작업 성공률을 향상시키도록 미세 조정되며, 동시에 제어기는 계획기의 변화하는 명령 분포를 정확하게 추적하도록 업데이트되어, 동작 품질을 저하시키는 분포 불일치를 줄입니다. 우리는 REFINE-DP를 문 통과 및 장기적인 객체 운송 작업을 수행하는 휴머노이드 로봇에서 검증했습니다. REFINE-DP는 시뮬레이션 환경에서 90% 이상의 성공률을 달성하며, 사전 훈련된 데이터에서 보지 못한 환경에서도 안정적인 자율 작업 수행을 가능하게 합니다. 제안된 방법은 기존의 사전 훈련된 DP 방법보다 훨씬 우수한 성능을 보이며, 신뢰성 있는 휴머노이드 로코-조작을 위해서는 강화 학습을 통한 미세 조정이 중요하다는 것을 보여줍니다.

Original Abstract

Humanoid loco-manipulation requires coordinated high-level motion plans with stable, low-level whole-body execution under complex robot-environment dynamics and long-horizon tasks. While diffusion policies (DPs) show promise for learning from demonstrations, deploying them on humanoids poses critical challenges: the motion planner trained offline is decoupled from the low-level controller, leading to poor command tracking, compounding distribution shift, and task failures. The common approach of scaling demonstration data is prohibitively expensive for high-dimensional humanoid systems. To address this challenge, we present REFINE-DP (REinforcement learning FINE-tuning of Diffusion Policy), a hierarchical framework that jointly optimizes a DP high-level planner and an RL-based low-level loco-manipulation controller. The DP is fine-tuned via a PPO-based diffusion policy gradient to improve task success rate, while the controller is simultaneously updated to accurately track the planner's evolving command distribution, reducing the distributional mismatch that degrades motion quality. We validate REFINE-DP on a humanoid robot performing loco-manipulation tasks, including door traversal and long-horizon object transport. REFINE-DP achieves an over $90\%$ success rate in simulation, even in out-of-distribution cases not seen in the pre-trained data, and enables smooth autonomous task execution in real-world dynamic environments. Our proposed method substantially outperforms pre-trained DP baselines and demonstrates that RL fine-tuning is key to reliable humanoid loco-manipulation. https://refine-dp.github.io/REFINE-DP/

2 Citations

0 Influential

3.5 Altmetric

19.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!