2601.15356v2 Jan 21, 2026 eess.IV

Q-Probe: 컨텍스트 인식 에이전트 기반 프로빙을 통한 고해상도 이미지 품질 평가 확장

Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing

Yu Wang

Citations: 16

h-index: 3

Xiang Li

Citations: 42

h-index: 4

Xueheng Li

Citations: 13

h-index: 2

Xuanhua He

Citations: 620

h-index: 11

Zhangchi Hu

Citations: 29

h-index: 2

Weiwei Yu

Citations: 10

h-index: 2

Chengjun Xie

Citations: 391

h-index: 8

강화 학습(RL)은 다중 모드 대규모 언어 모델(MLLM)이 이미지 품질 평가(IQA)에서 인간의 선호도에 더욱 부합하도록 하는 데 기여해 왔습니다. 그러나 기존의 RL 기반 IQA 모델은 일반적으로 거칠고 전반적인 관점에 의존하며, 고해상도 시나리오에서 발생하는 미묘한 국소적인 왜곡을 제대로 포착하지 못합니다. '이미지로 생각하기(Thinking with Images)'와 같은 새로운 패러다임은 확대 기능을 통해 다중 척도 시각적 인식을 가능하게 하지만, 이를 IQA에 직접 적용하면 인위적인 '잘라내기 시사 왜곡' 편향이 발생하고, 자연스러운 심도 효과를 인공물로 오해하게 됩니다. 이러한 문제점을 해결하기 위해, 우리는 컨텍스트 인식을 기반으로 IQA를 고해상도로 확장하도록 설계된 최초의 에이전트 기반 IQA 프레임워크인 Q-Probe를 제안합니다. 먼저, 우리는 고해상도 IQA 환경에서 미세한 국소적 왜곡 분석을 위해 특별히 설계된 획기적인 벤치마크인 Vista-Bench를 구축했습니다. 또한, 우리는 모델을 점진적으로 인간의 선호도에 맞추는 동시에, 새로운 컨텍스트 인식 잘라내기 전략을 통해 인과적 편향을 제거하는 세 단계의 학습 패러다임을 제안합니다. 광범위한 실험 결과, Q-Probe는 고해상도 환경에서 최첨단 성능을 달성하며, 동시에 다양한 해상도에서 뛰어난 효율성을 유지함을 보여줍니다.

Original Abstract

Reinforcement Learning (RL) has empowered Multimodal Large Language Models (MLLMs) to achieve superior human preference alignment in Image Quality Assessment (IQA). However, existing RL-based IQA models typically rely on coarse-grained global views, failing to capture subtle local degradations in high-resolution scenarios. While emerging "Thinking with Images" paradigms enable multi-scale visual perception via zoom-in mechanisms, their direct adaptation to IQA induces spurious "cropping-implies-degradation" biases and misinterprets natural depth-of-field as artifacts. To address these challenges, we propose Q-Probe, the first agentic IQA framework designed to scale IQA to high resolution via context-aware probing. First, we construct Vista-Bench, a pioneering benchmark tailored for fine-grained local degradation analysis in high-resolution IQA settings. Furthermore, we propose a three-stage training paradigm that progressively aligns the model with human preferences, while simultaneously eliminating causal bias through a novel context-aware cropping strategy. Extensive experiments demonstrate that Q-Probe achieves state-of-the-art performance in high-resolution settings while maintaining superior efficacy across resolution scales.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!