2602.13347v1 Feb 12, 2026 cs.CV

로봇 적재를 위한 시각적 예측: 희소 스냅샷으로부터의 확산 기반 월드 모델

Visual Foresight for Robotic Stow: A Diffusion-Based World Model from Sparse Snapshots

Nikhil Chacko

Citations: 5

h-index: 1

Petter Nilsson

Citations: 7

h-index: 2

Ruinian Xu

Citations: 5

h-index: 1

Shantanu Thakar

Citations: 1,232

h-index: 16

Baichuan Lou

Citations: 186

h-index: 5

Zhebin Zhang

Citations: 106

h-index: 6

Mudit Agrawal

Citations: 5

h-index: 1

Bhavana Chandrashekhar

Citations: 7

h-index: 2

A. Parness

Citations: 6

h-index: 1

Lijun Zhang

Citations: 62

h-index: 3

Harpreet Sawhney

Citations: 43

h-index: 4

자동화 창고는 로봇이 물품을 보관함에 배치하는 수백만 번의 적재(stow) 작업을 수행합니다. 이러한 시스템에서는 실제 실행 이전에 현재의 관측과 계획된 적재 동작을 바탕으로 보관함이 어떤 모습일지 예측하는 것이 매우 유용합니다. 본 논문에서는 보관함 상태를 개별 물품에 정렬된 인스턴스 마스크로 표현하고 잠재 확산 트랜스포머(latent diffusion transformer)를 사용하여 관측된 컨텍스트로부터 적재 후 배치를 예측하는 적재 의도 조건부 월드 모델(stow-intent-conditioned world model)인 FOREST를 제안합니다. 평가 결과, FOREST는 휴리스틱 베이스라인과 비교하여 예측된 적재 후 레이아웃과 실제 레이아웃 간의 기하학적 일치도를 상당히 향상시키는 것으로 나타났습니다. 나아가 두 가지 다운스트림 작업에서 예측된 적재 후 레이아웃을 평가한 결과, 실제 적재 후 마스크를 FOREST의 예측값으로 대체하더라도 적재 품질 평가 및 다중 적재 추론에서 미미한 성능 저하만 발생했습니다. 이는 제안하는 모델이 창고 계획을 위한 유용한 예측 신호를 제공할 수 있음을 시사합니다.

Original Abstract

Automated warehouses execute millions of stow operations, where robots place objects into storage bins. For these systems it is valuable to anticipate how a bin will look from the current observations and the planned stow behavior before real execution. We propose FOREST, a stow-intent-conditioned world model that represents bin states as item-aligned instance masks and uses a latent diffusion transformer to predict the post-stow configuration from the observed context. Our evaluation shows that FOREST substantially improves the geometric agreement between predicted and true post-stow layouts compared with heuristic baselines. We further evaluate the predicted post-stow layouts in two downstream tasks, in which replacing the real post-stow masks with FOREST predictions causes only modest performance loss in load-quality assessment and multi-stow reasoning, indicating that our model can provide useful foresight signals for warehouse planning.

0 Citations

0 Influential

8 Altmetric

40.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!