2605.05951v1 May 07, 2026 cs.AI

HaM-World: 선택적 메모리를 갖춘 소프트-해밀턴 세계 모델을 이용한 계획

HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning

Kun Wang

Citations: 645

h-index: 11

Hao Tang

Citations: 102

h-index: 3

Haodong Cui

Citations: 134

h-index: 5

Zhandong Mei

Citations: 5

h-index: 2

Ke Xu

Citations: 5

h-index: 1

세계 모델은 학습된 잠재적 역학을 통해 모델 기반 계획을 가능하게 하지만, 계획 수평이 증가하거나 역학 분포가 변화함에 따라 가상 시뮬레이션이 불안정해지는 경향이 있습니다. 우리는 이러한 불안정성이 계획 알고리즘에 사용되는 잠재 변수에 존재하는 두 가지 중요한 요소의 부재에서 비롯된다고 주장합니다. 즉, 근사 마르코프 완전성을 위한 과거 정보를 활용한 메모리 기능과, 자세, 운동량, 그리고 작업 의미를 분리하는 기하학적 구조가 부족합니다. 우리는 이러한 문제를 해결하기 위해 HaM-World (HMW)라는 구조화된 세계 모델을 제안합니다. HMW는 잠재 상태를 표준적인 (q, p) 부분 공간과 컨텍스트 부분 공간 c로 분해하며, Mamba 선택적 상태 공간 메모리를 사용하여 과거 정보를 잠재 역학 모델의 입력으로 활용합니다. 이 구조 내에서 (q, p)는 에너지 기반의 해밀턴 벡터 필드와 학습 가능한 잔차/제어 역학에 의해 변화하며, c는 의미론적, 소산적, 그리고 비보존적인 요소를 포착합니다. 이를 통해 계획 알고리즘은 역학 예측, 보상/가치 추정, 가상 시뮬레이션, 그리고 CEM 액션 탐색을 위해 단일 잠재 상태를 공유할 수 있습니다. DeepMind Control Suite의 네 가지 작업에서 HaM-World는 가장 높은 평균 AUC (117.9, +9.5%)를 달성하고, 강력한 기준 모델 대비 장기 시뮬레이션 오류를 45%까지 줄이며, {3,5,7} MSE 셀에서 11/12개의 셀에서 우수한 성능을 보였습니다. 또한, 역학 변화, 액션 지연, 그리고 관측 마스킹을 포함하는 12가지 OOD (Out-of-Distribution) 환경에서 HaM-World는 모든 조건에서 가장 높은 보상을 얻었으며, Finger Spin에서 평균 OOD 보상 증가율은 10.2%, Reacher Easy에서 13.6%였습니다. 추가적인 메커니즘 분석 결과, 액션 없는 해밀턴 에너지의 드리프트가 제한되고, 정책 시뮬레이션 하에서 구조화된 에너지 변화가 관찰되며, 일관된 제어에 의한 에너지 전달이 확인되었습니다. 이는 설계된 소프트-해밀턴 역학의 의도를 뒷받침합니다.

Original Abstract

World models enable model-based planning through learned latent dynamics, but imagined rollouts become unstable as the planning horizon grows or the dynamics distribution shifts. We argue that this instability reflects two missing structures in planner-facing latents: history-conditioned memory for approximate Markov completeness, and geometric organization that separates configuration, momentum, and task semantics. We propose HaM-World (HMW), a structured world model that decomposes the latent state into a canonical (q, p) subspace and a context subspace c, while using Mamba selective state-space memory as the history-conditioned input to the same latent dynamics. Within this interface, (q, p) evolves through an energy-derived Hamiltonian vector field plus learnable residual/control dynamics, while c captures semantic, dissipative, and non-conservative factors. This gives the planner a single latent state shared by dynamics prediction, reward/value estimation, imagined rollouts, and CEM action search. On four DeepMind Control Suite tasks, HaM-World reaches the highest Avg. AUC (117.9, +9.5%), reduces long-horizon rollout error to 45% of a strong baseline model, and wins 11/12 k in {3,5,7} MSE cells. Under 12 OOD perturbations spanning dynamics shifts, action delay, and observation masking, HaM-World achieves the highest return in every condition, with average OOD-return gains of 10.2% on Finger Spin and 13.6% on Reacher Easy. Mechanism diagnostics further show bounded action-free Hamiltonian-energy drift, structured energy variation under policy rollouts, and coherent control-induced energy transfer, supporting the intended Soft-Hamiltonian dynamics design.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!