2603.14941v1 Mar 16, 2026 cs.AI

RS-WorldModel: 원격 감지 이해 및 미래 예측을 위한 통합 모델

RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting

Gang Xu

Citations: 46

h-index: 4

Zhongan Wang

Citations: 9

h-index: 1

Feiyu Shen

Citations: 215

h-index: 2

Huiping Zhuang

Citations: 11

h-index: 2

Haifeng Li

Citations: 23

h-index: 2

Linrui Xu

Citations: 13

h-index: 2

Ming Li

Citations: 21

h-index: 3

원격 감지 세계 모델은 관찰된 변화를 설명하고 가능한 미래를 예측하는 것을 목표로 하며, 이 두 가지 작업은 시공간적 사전 지식을 공유합니다. 그러나 기존 방법은 일반적으로 이러한 작업을 별도로 처리하여, 작업 간의 상호 이전을 제한합니다. 본 논문에서는 원격 감지 작업을 위한 통합 세계 모델인 RS-WorldModel을 제시합니다. RS-WorldModel은 시공간적 변화 이해와 텍스트 기반 미래 장면 예측을 동시에 처리하며, 두 가지 작업 모두를 포괄하는 풍부한 언어 주석이 포함된 110만 개의 샘플 데이터셋인 RSWBench-1.1M을 구축했습니다. RS-WorldModel은 세 단계로 학습됩니다. (1) 지리 정보를 고려한 생성형 사전 학습(GAGP)은 지리적 정보 및 획득 메타데이터를 기반으로 예측을 수행합니다. (2) 시너지 효과를 내는 명령어 튜닝(SIT)은 이해 및 예측 작업을 동시에 학습합니다. (3) 검증 가능한 강화 최적화(VRO)는 검증 가능한, 작업별 보상을 사용하여 출력을 개선합니다. RS-WorldModel은 20억 개의 파라미터로 구성되어 있으며, 대부분의 시공간적 변화 질문 답변 지표에서 최대 120배 더 큰 오픈 소스 모델보다 뛰어난 성능을 보입니다. 또한, 텍스트 기반 미래 장면 예측 작업에서 43.13의 FID 값을 달성하여, 모든 오픈 소스 기준 모델뿐만 아니라 폐쇄형 소스 모델인 Gemini-2.5-Flash Image (Nano Banana)보다 우수한 성능을 보였습니다.

Original Abstract

Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separately, limiting cross-task transfer. We present RS-WorldModel, a unified world model for remote sensing that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, and we build RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations covering both tasks. RS-WorldModel is trained in three stages: (1) Geo-Aware Generative Pre-training (GAGP) conditions forecasting on geographic and acquisition metadata; (2) synergistic instruction tuning (SIT) jointly trains understanding and forecasting; (3) verifiable reinforcement optimization (VRO) refines outputs with verifiable, task-specific rewards. With only 2B parameters, RS-WorldModel surpasses open-source models up to 120$ \times $ larger on most spatiotemporal change question-answering metrics. It achieves an FID of 43.13 on text-guided future scene forecasting, outperforming all open-source baselines as well as the closed-source Gemini-2.5-Flash Image (Nano Banana).

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!