2603.16253v1 Mar 17, 2026 cs.CV

점수 기반 접근 방식: 신뢰할 수 있는 비전-언어 프로세스 보상 모델을 위한 명시적인 시각적 전제 조건 검증

Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models

Zhengyi Yang

Citations: 1

h-index: 1

Mengyu Zhou

Citations: 6

h-index: 2

Erchao Zhao

Citations: 7

h-index: 2

Xiaoxi Jiang

Citations: 39

h-index: 3

Guanjun Jiang

Citations: 37

h-index: 3

Junxin Wang

Citations: 13

h-index: 2

Dai Guan

Citations: 0

h-index: 0

Weijie Qiu

Citations: 120

h-index: 3

Zhihang Li

Citations: 23

h-index: 3

Yongbo Gai

Citations: 22

h-index: 2

비전-언어 프로세스 보상 모델(VL-PRM)은 점점 더 많이 사용되어 추론 단계의 중간 결과를 평가하고 테스트 시간 스케일링 환경에서 후보를 재순위화합니다. 그러나 이러한 모델은 종종 블랙박스 형태로 작동하며, 낮은 단계 점수는 실제 추론 오류를 반영할 수도 있고, 단순히 검증기가 이미지를 잘못 해석했기 때문일 수도 있습니다. 이러한 인식과 추론 간의 복잡성은 체계적인 오탐(환각된 시각적 전제를 보상하는 경우)과 오부정(정확한 기반 진술을 처벌하는 경우)을 초래하여, 재순위화 및 오류 위치 추적을 저해합니다. 본 연구에서는 단계 점수에 기반하여 단계가 의존하는 시각적 전제의 신뢰성에 따라 점수를 조정하는 경량 검증 인터페이스인 Explicit Visual Premise Verification (EVPV)를 제안합니다. EVPV는 정책에 단계별 시각적 체크리스트를 생성하도록 유도하여 필요한 시각적 사실을 명시적으로 나타내도록 합니다. 동시에, 제약 조건 추출기는 입력 이미지로부터 독립적으로 구조화된 시각적 제약 조건을 도출합니다. EVPV는 체크리스트 주장을 이러한 제약 조건과 비교하여 스칼라 시각적 신뢰도 신호를 계산하고, 신뢰도 게이팅을 통해 PRM 단계 보상을 조정합니다. 즉, 시각적 의존성이 높은 단계의 경우 신뢰도가 낮으면 보상이 감소하고, 신뢰도가 높으면 보상이 유지됩니다. 이를 통해 추가적인 단계별 도구 호출 없이도 인식적 불확실성과 논리적 평가를 분리할 수 있습니다. VisualProcessBench 및 6개의 멀티모달 추론 벤치마크에서의 실험 결과, EVPV는 단계 수준의 검증을 개선하고 강력한 기본 모델보다 일관되게 Best-of-N 재순위화 정확도를 향상시켰습니다. 또한, 추출된 제약 조건에 의도적으로 노이즈를 추가하면 성능이 단조적으로 저하되는 것을 확인하여, 성능 향상이 제약 조건의 충실성과 명시적인 전제 조건 검증에서 비롯된 것이며, 우연한 프롬프트 효과가 아니라는 인과적 증거를 제공합니다. 관련 코드는 다음 주소에서 확인할 수 있습니다: https://github.com/Qwen-Applications/EVPV-PRM

Original Abstract

Vision-language process reward models (VL-PRMs) are increasingly used to score intermediate reasoning steps and rerank candidates under test-time scaling. However, they often function as black-box judges: a low step score may reflect a genuine reasoning mistake or simply the verifier's misperception of the image. This entanglement between perception and reasoning leads to systematic false positives (rewarding hallucinated visual premises) and false negatives (penalizing correct grounded statements), undermining both reranking and error localization. We introduce Explicit Visual Premise Verification (EVPV), a lightweight verification interface that conditions step scoring on the reliability of the visual premises a step depends on. The policy is prompted to produce a step-wise visual checklist that makes required visual facts explicit, while a constraint extractor independently derives structured visual constraints from the input image. EVPV matches checklist claims against these constraints to compute a scalar visual reliability signal, and calibrates PRM step rewards via reliability gating: rewards for visually dependent steps are attenuated when reliability is low and preserved when reliability is high. This decouples perceptual uncertainty from logical evaluation without per-step tool calls. Experiments on VisualProcessBench and six multimodal reasoning benchmarks show that EVPV improves step-level verification and consistently boosts Best-of-N reranking accuracy over strong baselines. Furthermore, injecting controlled corruption into the extracted constraints produces monotonic performance degradation, providing causal evidence that the gains arise from constraint fidelity and explicit premise verification rather than incidental prompt effects. Code is available at: https://github.com/Qwen-Applications/EVPV-PRM

0 Citations

0 Influential

26.993061443341 Altmetric

135.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!