2603.15388v1 Mar 16, 2026 cs.LG

스태클버그 근접 정책 최적화를 통한 효율적인 형태 제어 공동 설계

Efficient Morphology-Control Co-Design via Stackelberg Proximal Policy Optimization

Yuhui Wang

Citations: 31

h-index: 4

Yan Dai

Citations: 29

h-index: 2

Dylan R. Ashley

Citations: 170

h-index: 3

Jürgen Schmidhuber

Citations: 1,790

h-index: 8

형태 제어 공동 설계는 에이전트의 신체 구조와 제어 정책을 동시에 최적화하는 문제입니다. 이 문제는 양층 구조를 가지며, 제어 정책은 성능을 최대화하기 위해 형태에 동적으로 적응합니다. 기존 방법은 일반적으로 제어 정책을 고정된 것으로 취급하는 단층 구조를 채택하여 제어의 적응 역학을 간과합니다. 이는 형태 업데이트가 제어 적응과 일치하지 않아 비효율적인 최적화를 초래할 수 있습니다. 본 논문에서는 게임 이론적 관점에서 공동 설계 문제를 재검토하고, 형태와 제어 간의 내재적인 결합을 새로운 형태의 스태클버그 게임으로 모델링합니다. 우리는 제어의 적응 역학을 형태 최적화에 명시적으로 통합하는 스태클버그 근접 정책 최적화(Stackelberg PPO)를 제안합니다. 본 방법은 이러한 내재적인 결합을 모델링하여 형태 업데이트를 제어 적응과 일치시켜 학습 안정성을 높이고 학습 효율성을 향상시킵니다. 다양한 공동 설계 작업에 대한 실험 결과, 스태클버그 PPO는 표준 PPO보다 안정성 및 최종 성능 모두에서 우수한 성능을 보이며, 로봇 설계의 효율성을 크게 향상시킬 수 있음을 보여줍니다.

Original Abstract

Morphology-control co-design concerns the coupled optimization of an agent's body structure and control policy. This problem exhibits a bi-level structure, where the control dynamically adapts to the morphology to maximize performance. Existing methods typically neglect the control's adaptation dynamics by adopting a single-level formulation that treats the control policy as fixed when optimizing morphology. This can lead to inefficient optimization, as morphology updates may be misaligned with control adaptation. In this paper, we revisit the co-design problem from a game-theoretic perspective, modeling the intrinsic coupling between morphology and control as a novel variant of a Stackelberg game. We propose Stackelberg Proximal Policy Optimization (Stackelberg PPO), which explicitly incorporates the control's adaptation dynamics into morphology optimization. By modeling this intrinsic coupling, our method aligns morphology updates with control adaptation, thereby stabilizing training and improving learning efficiency. Experiments across diverse co-design tasks demonstrate that Stackelberg PPO outperforms standard PPO in both stability and final performance, opening the way for dramatically more efficient robotics designs.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!