2604.17504v1 Apr 19, 2026 cs.CV

RS-HyRe-R1: 원격 감지 이미지 이해를 위한 혼합 보상 메커니즘 - 지각적 관성의 극복

RS-HyRe-R1: A Hybrid Reward Mechanism to Overcome Perceptual Inertia for Remote Sensing Images Understanding

Huajun He

Citations: 0

h-index: 0

Peng Shen

Citations: 19

h-index: 2

Wang Guo

Citations: 71

h-index: 5

Haifeng Li

Citations: 19

h-index: 2

Gaozhi Zhou

Citations: 23

h-index: 2

Liujue Zhang

Citations: 0

h-index: 0

Linrui Xu

Citations: 65

h-index: 4

Zeyu Wang

Citations: 9

h-index: 2

Ziyu Li

Citations: 0

h-index: 0

Xuezhi Cui

Citations: 24

h-index: 3

Jipeng Zhang

Citations: 250

h-index: 8

강화 학습(RL)을 통한 추가 훈련은 원격 감지 시각-언어 모델(RS-VLMs)의 성능을 크게 향상시킵니다. 그러나, 광범위한 시각적 검토가 필요한 복잡한 원격 감지 이미지(RSI)를 처리할 때, 모델은 빠른 추론을 위해 특정 영역의 중요한 특징에 의존하는 경향이 있습니다. 우리는 이러한 RL로 인한 편향을 "지각적 관성"이라고 부릅니다. 보상 극대화를 통해 모델은 빠른 결과 도출을 선호하며, 이는 다음과 같은 두 가지 제한으로 이어집니다. 인지적으로, 특정 특징에 대한 과도한 의존은 완전한 증거 구성에 방해가 됩니다. 운영적으로, 모델은 다양한 작업에서 시각적 초점을 유연하게 전환하는 데 어려움을 겪습니다. 이러한 편향을 해결하고 포괄적인 시각적 증거 탐색을 장려하기 위해, 우리는 RSI 이해를 위한 혼합 보상 프레임워크인 RS-HyRe-R1을 제안합니다. RS-HyRe-R1은 다음과 같은 기능을 제공합니다: (1) 구조화된 시각적 추론을 강화하는 공간 추론 활성화 보상, (2) RS 작업 전반에 걸쳐 적응적인 품질 기준을 제공하여 정확한 기하학적 및 의미적 정렬을 보장하는 지각 정확성 보상, 그리고 (3) 반복적인 추론을 억제하고 풍부한 증거 체인을 구축하기 위해 상호 보완적인 특징을 탐색하도록 유도하는 시각-의미 경로 진화 보상. 실험 결과, RS-HyRe-R1은 "지각적 관성"을 효과적으로 완화하여 더 깊고 다양한 추론을 가능하게 합니다. 단 30억 개의 파라미터로, REC, OVD 및 VQA 작업에서 최첨단 성능을 달성했으며, 최대 70억 개의 파라미터를 가진 모델보다 우수한 성능을 보였습니다. 또한, VQA, OVD 및 REC에서 각각 3.16%, 3.97% 및 2.72%의 향상을 보이는 강력한 제로샷 일반화 능력을 보여줍니다. 코드 및 데이터세트는 https://github.com/geox-lab/RS-HyRe-R1 에서 확인할 수 있습니다.

Original Abstract

Reinforcement learning (RL) post-training substantially improves remote sensing vision-language models (RS-VLMs). However, when handling complex remote sensing imagery (RSI) requiring exhaustive visual scanning, models tend to rely on localized salient cues for rapid inference. We term this RL-induced bias "perceptual inertia". Driven by reward maximization, models favor quick outcome fitting, leading to two limitations: cognitively, overreliance on specific features impedes complete evidence construction; operationally, models struggle to flexibly shift visual focus across tasks. To address this bias and encourage comprehensive visual evidence mining, we propose RS-HyRe-R1, a hybrid reward framework for RSI understanding. It introduces: (1) a spatial reasoning activation reward that enforces structured visual reasoning; (2) a perception correctness reward that provides adaptive quality anchors across RS tasks, ensuring accurate geometric and semantic alignment; and (3) a visual-semantic path evolution reward that penalizes repetitive reasoning and promotes exploration of complementary cues to build richer evidence chains. Experiments show RS-HyRe-R1 effectively mitigates "perceptual inertia", encouraging deeper, more diverse reasoning. With only 3B parameters, it achieves state-of-the-art performance on REC, OVD, and VQA tasks, outperforming models up to 7B parameters. It also demonstrates strong zero-shot generalization, surpassing the second-best model by 3.16%, 3.97%, and 2.72% on VQA, OVD, and REC, respectively. Code and datasets are available at https://github.com/geox-lab/RS-HyRe-R1.

0 Citations

0 Influential

24 Altmetric

120.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!