2604.05514v1 Apr 07, 2026 cs.AI

OmniDiagram: 시각적 검증 보상을 통한 통합 다이어그램 코드 생성 기술 발전

OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward

Haoyue Yang

Citations: 27

h-index: 2

Xuanle Zhao

Citations: 58

h-index: 4

Xuexin Liu

Citations: 26

h-index: 2

Fei Jiang

Citations: 3

h-index: 1

Yaoming Zhu

Citations: 33

h-index: 3

프로그래밍 가능한 다이어그램 생성 패러다임은 빠르게 발전하고 있으며, 구조화된 시각화에서 중요한 역할을 합니다. 그러나 대부분의 기존 연구는 제한된 작업 정의 및 언어 지원에 머물러 있어 다양한 유형의 다이어그램에 대한 적용성을 제한합니다. 본 연구에서는 다양한 다이어그램 코드 언어 및 작업 정의를 통합한 통합 프레임워크인 OmniDiagram을 제안합니다. 강화 학습(RL)에서 코드 로직과 시각적 충실도를 일치시키는 데 있어 발생하는 어려움을 해결하기 위해, 시각적 피드백 전략인 Visual Interrogation Verifies All ( extsc{Viva})을 새롭게 제안합니다. extsc{Viva}는 기존의 경직된 구문 기반 규칙이나 픽셀 단위 매칭과 달리, 생성적 접근 방식을 통해 렌더링된 다이어그램의 시각적 구조에 대한 보상을 제공합니다. 구체적으로, extsc{Viva}는 다이어그램의 시각적 충실도를 면밀히 검토하기 위해 의도적으로 설계된 시각적 질의를 생성하고, 최적화를 위한 세밀한 피드백을 제공합니다. 이러한 메커니즘은 수동으로 주석 처리된 정답 코드의 필요성을 효과적으로 없애는 자체 발전적인 학습 프로세스를 가능하게 합니다. 또한, 196,000개 이상의 고품질 데이터 인스턴스를 포함하는 최초의 대규모 다이어그램 코드 생성 데이터셋인 M3$^2$Diagram을 구축했습니다. 실험 결과는 SFT와 extsc{Viva} 기반 RL의 결합이 OmniDiagram을 다이어그램 코드 생성 벤치마크에서 새로운 최고 성능(SOTA)을 달성하도록 한다는 것을 확인합니다.

Original Abstract

The paradigm of programmable diagram generation is evolving rapidly, playing a crucial role in structured visualization. However, most existing studies are confined to a narrow range of task formulations and language support, constraining their applicability to diverse diagram types. In this work, we propose OmniDiagram, a unified framework that incorporates diverse diagram code languages and task definitions. To address the challenge of aligning code logic with visual fidelity in Reinforcement Learning (RL), we introduce a novel visual feedback strategy named Visual Interrogation Verifies All (\textsc{Viva}). Unlike brittle syntax-based rules or pixel-level matching, \textsc{Viva} rewards the visual structure of rendered diagrams through a generative approach. Specifically, \textsc{Viva} actively generates targeted visual inquiries to scrutinize diagram visual fidelity and provides fine-grained feedback for optimization. This mechanism facilitates a self-evolving training process, effectively obviating the need for manually annotated ground truth code. Furthermore, we construct M3$^2$Diagram, the first large-scale diagram code generation dataset, containing over 196k high-quality instances. Experimental results confirm that the combination of SFT and our \textsc{Viva}-based RL allows OmniDiagram to establish a new state-of-the-art (SOTA) across diagram code generation benchmarks.

1 Citations

0 Influential

2 Altmetric

11.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!