2601.12781v1 Jan 19, 2026 cs.AI

VIRO: 지칭 표현 이해를 위한 검증 기반의 견고하고 효율적인 신경망-기호 추론

VIRO: Robust and Efficient Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension

Jungseul Ok

Citations: 1,015

h-index: 16

Hyejin Park

Citations: 46

h-index: 4

Junhyuk Kwon

Citations: 37

h-index: 1

S. Kwak

Citations: 49

h-index: 4

지칭 표현 이해(REC)는 자연어 질의에 해당하는 이미지 영역을 찾는 것을 목표로 한다. 최근의 신경망-기호 REC 접근법들은 대규모 언어 모델(LLM)과 비전-언어 모델(VLM)을 활용하여 질의를 구조화된 프로그램으로 분해하고 이를 단계별로 실행하는 구성적 추론을 수행한다. 이러한 접근법은 해석 가능한 추론과 강력한 제로샷 일반화 성능을 달성하지만, 중간 추론 단계가 정확하다는 가정을 전제로 한다. 그러나 이러한 가정은 연쇄적인 오류를 야기하여, 오탐지나 유효하지 않은 관계가 추론 사슬을 통해 전파됨으로써 이미지에 타겟이 존재하지 않음에도 높은 신뢰도로 거짓 양성(false positive) 결과를 초래한다. 이러한 한계를 해결하기 위해, 우리는 추론 단계에 경량화된 연산자 수준의 검증기를 내장한 신경망-기호 프레임워크인 검증 통합 추론 연산자(VIRO)를 제안한다. 각 연산자는 객체 존재 여부나 공간적 관계와 같은 결과를 실행하고 검증하며, 이를 통해 검증 조건이 충족되지 않을 때 타겟이 없는 케이스를 견고하게 처리할 수 있다. 본 프레임워크는 타겟 존재 및 부재 설정 전반에서 61.1%의 균형 정확도를 기록하며 최고 수준의 성능을 달성하였고, 실제 자기중심 데이터에 대한 일반화 가능성을 입증하였다. 또한, VIRO는 처리량 측면에서 뛰어난 연산 효율성, 0.3% 미만의 프로그램 실패율을 통한 높은 신뢰성, 그리고 프로그램 생성과 실행의 분리를 통한 확장성을 보여준다.

Original Abstract

Referring Expression Comprehension (REC) aims to localize the image region corresponding to a natural-language query. Recent neuro-symbolic REC approaches leverage large language models (LLMs) and vision-language models (VLMs) to perform compositional reasoning, decomposing queries 4 structured programs and executing them step-by-step. While such approaches achieve interpretable reasoning and strong zero-shot generalization, they assume that intermediate reasoning steps are accurate. However, this assumption causes cascading errors: false detections and invalid relations propagate through the reasoning chain, yielding high-confidence false positives even when no target is present in the image. To address this limitation, we introduce Verification-Integrated Reasoning Operators (VIRO), a neuro-symbolic framework that embeds lightweight operator-level verifiers within reasoning steps. Each operator executes and validates its output, such as object existence or spatial relationship, thereby allowing the system to robustly handle no-target cases when verification conditions are not met. Our framework achieves state-of-the-art performance, reaching 61.1% balanced accuracy across target-present and no-target settings, and demonstrates generalization to real-world egocentric data. Furthermore, VIRO shows superior computational efficiency in terms of throughput, high reliability with a program failure rate of less than 0.3%, and scalability through decoupled program generation from execution.

0 Citations

0 Influential

8 Altmetric

40.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!