2602.06090v1 Feb 05, 2026 cs.SE

SVRepair: 구조화된 시각적 추론을 통한 자동 프로그램 수정

SVRepair: Structured Visual Reasoning for Automated Program Repair

Dajun Chen

Citations: 141

h-index: 5

Wei Jiang

Citations: 39

h-index: 5

Yong Li

Citations: 33

h-index: 3

Sheng Zhou

Citations: 78

h-index: 5

Xiaoxuan Tang

Citations: 30

h-index: 3

Jincheng Wang

Citations: 90

h-index: 5

L. Luo

Citations: 13

h-index: 1

Jingxuan Xu

Citations: 11

h-index: 2

최근 대규모 언어 모델(LLM)은 자동 프로그램 수정(APR) 분야에서 뛰어난 잠재력을 보여주었지만, 대부분의 기존 접근 방식은 단일 모드이며 스크린샷이나 제어 흐름 그래프와 같은 시각적 요소에 포함된 풍부한 진단 정보를 활용하지 못합니다. 실제로 많은 버그 보고서가 중요한 정보를 시각적으로 전달하지만(예: 레이아웃 깨짐 또는 누락된 위젯), 이러한 밀집된 시각적 입력을 직접 사용하는 것은 종종 문맥 손실과 노이즈를 유발하여 LLM이 시각적 관찰을 정확한 오류 위치 파악 및 실행 가능한 패치로 연결하는 것을 어렵게 만듭니다. 이러한 의미론적 간극을 해소하기 위해, 우리는 구조화된 시각적 표현을 갖춘 다중 모드 APR 프레임워크인 **SVRepair**를 제안합니다. SVRepair는 먼저 비전-언어 모델인 **구조화된 시각적 표현(SVR)**을 파인튜닝하여 다양한 시각적 요소를 GUI 요소와 그들의 구조적 관계(예: 계층 구조)를 캡처하는 *의미론적 장면 그래프*로 일관되게 변환합니다. 이를 통해 다운스트림 수정 작업을 위한 정규화된 코드 관련 문맥을 제공합니다. 그래프를 기반으로, SVRepair는 코딩 에이전트를 사용하여 오류를 찾고 패치를 생성하며, 또한 오류 중심 영역에 집중하여 관련 없는 문맥을 억제하고 환각을 줄이는 반복적인 시각적 요소 분할 전략을 도입합니다. 여러 벤치마크에서 수행한 광범위한 실험 결과, SVRepair는 최첨단 성능을 달성했습니다. SVRepair는 SWE-Bench M에서 **36.47%**의 정확도, MMCode에서 **38.02%**, CodeVision에서 **95.12%**를 달성하여 다중 모드 프로그램 수정에 대한 SVRepair의 효과를 입증했습니다.

Original Abstract

Large language models (LLMs) have recently shown strong potential for Automated Program Repair (APR), yet most existing approaches remain unimodal and fail to leverage the rich diagnostic signals contained in visual artifacts such as screenshots and control-flow graphs. In practice, many bug reports convey critical information visually (e.g., layout breakage or missing widgets), but directly using such dense visual inputs often causes context loss and noise, making it difficult for MLLMs to ground visual observations into precise fault localization and executable patches. To bridge this semantic gap, we propose \textbf{SVRepair}, a multimodal APR framework with structured visual representation. SVRepair first fine-tunes a vision-language model, \textbf{Structured Visual Representation (SVR)}, to uniformly transform heterogeneous visual artifacts into a \emph{semantic scene graph} that captures GUI elements and their structural relations (e.g., hierarchy), providing normalized, code-relevant context for downstream repair. Building on the graph, SVRepair drives a coding agent to localize faults and synthesize patches, and further introduces an iterative visual-artifact segmentation strategy that progressively narrows the input to bug-centered regions to suppress irrelevant context and reduce hallucinations. Extensive experiments across multiple benchmarks demonstrate state-of-the-art performance: SVRepair achieves \textbf{36.47\%} accuracy on SWE-Bench M, \textbf{38.02\%} on MMCode, and \textbf{95.12\%} on CodeVision, validating the effectiveness of SVRepair for multimodal program repair.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!