2601.21718v1 Jan 29, 2026 cs.LG

예측적 역동학 모델이 행동 복제를 능가하는 경우는 언제인가?

When does predictive inverse dynamics outperform behavior cloning?

L. Francca

Citations: 8

h-index: 2

Alex Lamb

Citations: 78

h-index: 6

John Langford

Citations: 125

h-index: 7

Lukas Schafer

Citations: 18

h-index: 2

Pallavi Choudhury

Citations: 969

h-index: 7

Abdelhak Lemkhenter

Citations: 62

h-index: 2

Chris Lovett

Citations: 8

h-index: 2

Somjit Nath

Citations: 258

h-index: 8

Matheus Ribeiro Furtado de Mendoncca

Citations: 8

h-index: 2

Riashat Islam

Citations: 38

h-index: 3

Siddhartha Sen

Citations: 8

h-index: 2

K. Hofmann

Citations: 314

h-index: 7

Sergio Valcarcel Macua

Citations: 1,075

h-index: 12

행동 복제(Behavior Cloning, BC)는 실용적인 오프라인 모방 학습 방법이지만, 전문가의 시범 데이터가 제한적인 경우 종종 실패합니다. 최근 연구에서는 미래 상태 예측기와 역동학 모델(Inverse Dynamics Model, IDM)을 결합한 예측적 역동학 모델(Predictive Inverse Dynamics Model, PIDM)이라는 새로운 아키텍처가 소개되었습니다. PIDM은 종종 BC보다 성능이 우수하지만, 그 이점의 근본적인 이유가 명확하지 않았습니다. 본 논문에서는 PIDM이 도입하는 편향-분산(bias-variance) 트레이드오프를 통해 이론적인 설명을 제공합니다. 미래 상태 예측은 편향을 발생시키지만, 예측값을 기반으로 IDM을 조건화하면 분산을 크게 줄일 수 있습니다. 우리는 PIDM이 BC보다 낮은 예측 오류와 더 높은 샘플 효율성을 달성하기 위한 조건을 제시하며, 추가 데이터 소스가 존재하는 경우 이러한 성능 차이는 더욱 커집니다. 우리는 2차원 내비게이션 작업에서 BC가 PIDM보다 최대 5배(평균 3배) 더 많은 시범 데이터가 필요하며, 최신 비디오 게임의 복잡한 3차원 환경에서 고차원 시각 입력과 확률적 변환이 존재하는 경우 BC가 PIDM보다 66% 이상 더 많은 샘플이 필요하다는 사실을 경험적으로 검증했습니다.

Original Abstract

Behavior cloning (BC) is a practical offline imitation learning method, but it often fails when expert demonstrations are limited. Recent works have introduced a class of architectures named predictive inverse dynamics models (PIDM) that combine a future state predictor with an inverse dynamics model (IDM). While PIDM often outperforms BC, the reasons behind its benefits remain unclear. In this paper, we provide a theoretical explanation: PIDM introduces a bias-variance tradeoff. While predicting the future state introduces bias, conditioning the IDM on the prediction can significantly reduce variance. We establish conditions on the state predictor bias for PIDM to achieve lower prediction error and higher sample efficiency than BC, with the gap widening when additional data sources are available. We validate the theoretical insights empirically in 2D navigation tasks, where BC requires up to five times (three times on average) more demonstrations than PIDM to reach comparable performance; and in a complex 3D environment in a modern video game with high-dimensional visual inputs and stochastic transitions, where BC requires over 66\% more samples than PIDM.

2 Citations

0 Influential

6 Altmetric

32.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!