2603.13628v1 Mar 13, 2026 cs.CV

시각-언어 모델을 활용한 이미지 지리 위치 추정을 위한 위치 추정 가능성 기반 적응적 추론

Locatability-Guided Adaptive Reasoning for Image Geo-Localization with Vision-Language Models

Yiming Liu

Citations: 57

h-index: 3

Xiaofang Zhou

Citations: 43

h-index: 2

Boting Yu

Citations: 12

h-index: 1

Fengze Yang

Citations: 41

h-index: 4

Chao Wang

Citations: 15

h-index: 3

Xuewen Luo

Citations: 36

h-index: 3

Tao Li

Citations: 1

h-index: 1

Ruimin Ke

Citations: 141

h-index: 4

Chenxi Liu

Citations: 24

h-index: 2

최근 등장한 시각-언어 모델(VLMs)은 검색 증강 생성(RAG) 및 추론 기반 추론을 통해 이미지의 전역적인 지리 위치 추정에 새로운 패러다임을 제시했습니다. 그러나 RAG 방법은 검색 데이터베이스의 품질에 의해 제한되는 반면, 추론 기반 접근 방식은 이미지의 위치 추정 가능성을 고려하지 못하고, 비효율적인, 고정 깊이의 추론 경로에 의존하여 환각 현상을 증가시키고 정확도를 저하시킵니다. 이러한 한계점을 극복하기 위해, 우리는 이미지의 심층적인 추론에 대한 적합성을 정량화하는 최적화된 위치 추정 가능성 점수를 도입했습니다. 이 지표를 사용하여, 복잡한 시각적 장면을 위한 증강된 추론 경로가 포함된, 위치 추정 가능성 기반 계층 구조의 추론 데이터셋인 Geo-ADAPT-51K를 구축했습니다. 이 기반을 바탕으로, 우리는 적응적 추론 깊이, 시각적 기반 연결, 계층적인 지리적 정확도를 조절하는 맞춤형 보상 함수를 갖춘 2단계 그룹 상대 정책 최적화(GRPO) 교육 과정을 제안합니다. 우리의 프레임워크인 Geo-ADAPT는 적응적 추론 정책을 학습하고, 여러 지리 위치 추정 벤치마크에서 최첨단 성능을 달성하며, 적응적이고 효율적인 추론을 통해 환각 현상을 크게 줄입니다.

Original Abstract

The emergence of Vision-Language Models (VLMs) has introduced new paradigms for global image geo-localization through retrieval-augmented generation (RAG) and reasoning-driven inference. However, RAG methods are constrained by retrieval database quality, while reasoning-driven approaches fail to internalize image locatability, relying on inefficient, fixed-depth reasoning paths that increase hallucinations and degrade accuracy. To overcome these limitations, we introduce an Optimized Locatability Score that quantifies an image's suitability for deep reasoning in geo-localization. Using this metric, we curate Geo-ADAPT-51K, a locatability-stratified reasoning dataset enriched with augmented reasoning trajectories for complex visual scenes. Building on this foundation, we propose a two-stage Group Relative Policy Optimization (GRPO) curriculum with customized reward functions that regulate adaptive reasoning depth, visual grounding, and hierarchical geographical accuracy. Our framework, Geo-ADAPT, learns an adaptive reasoning policy, achieves state-of-the-art performance across multiple geo-localization benchmarks, and substantially reduces hallucinations by reasoning both adaptively and efficiently.

1 Citations

0 Influential

2 Altmetric

11.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!