2602.14065v1 Feb 15, 2026 cs.AI

REAL: 추론-피벗 정렬을 통한 지식 집약적 시각적 질의응답에서의 지식 충돌 해결

REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment

Kai Ye

Citations: 12

h-index: 2

Xianwei Mao

Citations: 5

h-index: 1

Sheng Zhou

Citations: 12

h-index: 2

Zirui Shao

Citations: 61

h-index: 5

Ye Mo

Zhejiang University

Citations: 12

h-index: 2

Liangliang Liu

Citations: 10

h-index: 2

Haikuan Huang

Citations: 227

h-index: 5

Bin Li

Citations: 8

h-index: 2

Jiajun Bu

Citations: 99

h-index: 4

지식 집약적 시각적 질의응답(KI-VQA)은 개방형 도메인 검색의 본질적인 한계로 인해 발생하는 심각한 지식 충돌 문제를 자주 겪습니다. 그러나 기존 패러다임은 충돌하는 증거를 다루기 위한 일반화 가능한 충돌 탐지 및 모델 내부 제약 메커니즘이 부족하여 치명적인 한계에 직면해 있습니다. 이러한 문제를 해결하기 위해, 본 논문에서는 '추론-피벗(Reasoning-Pivot)'이라는 새로운 개념을 중심으로 한 REAL(Reasoning-Pivot Alignment) 프레임워크를 제안합니다. 내부적인 자체 유도를 우선시하는 일반적인 추론 단계와 달리, 추론-피벗은 지식의 연결성을 강조하는 추론 사슬 내의 원자 단위(노드 또는 엣지)로서 기능하며, 주로 외부 증거에 의존하여 추론을 완성합니다. 자체 구축한 REAL-VQA 데이터셋을 기반으로, 본 연구의 접근 방식은 충돌을 피벗 추출과 정렬하여 일반화 가능한 판별기를 학습시키는 '추론-피벗 인식 SFT(RPA-SFT)'를 통합하고, 이러한 피벗을 활용하여 목표 지향적으로 충돌을 완화하는 모델 내부 디코딩 전략인 '추론-피벗 유도 디코딩(RPGD)'을 적용합니다. 다양한 벤치마크에 대한 광범위한 실험 결과, REAL은 판별 정확도를 유의미하게 향상시키고 최고 수준의 성능(SOTA)을 달성하여 피벗 주도형 해결 패러다임의 효과성을 입증하였습니다.

Original Abstract

Knowledge-intensive Visual Question Answering (KI-VQA) frequently suffers from severe knowledge conflicts caused by the inherent limitations of open-domain retrieval. However, existing paradigms face critical limitations due to the lack of generalizable conflict detection and intra-model constraint mechanisms to handle conflicting evidence. To address these challenges, we propose the REAL (Reasoning-Pivot Alignment) framework centered on the novel concept of the Reasoning-Pivot. Distinct from reasoning steps that prioritize internal self-derivation, a reasoning-pivot serves as an atomic unit (node or edge) in the reasoning chain that emphasizes knowledge linkage, and it typically relies on external evidence to complete the reasoning. Supported by our constructed REAL-VQA dataset, our approach integrates Reasoning-Pivot Aware SFT (RPA-SFT) to train a generalizable discriminator by aligning conflicts with pivot extraction, and employs Reasoning-Pivot Guided Decoding (RPGD), an intra-model decoding strategy that leverages these pivots for targeted conflict mitigation. Extensive experiments across diverse benchmarks demonstrate that REAL significantly enhances discrimination accuracy and achieves state-of-the-art performance, validating the effectiveness of our pivot-driven resolution paradigm.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!