2601.18320v1 Jan 26, 2026 cs.CL

MultiVis-Agent: 논리 규칙 기반의 다중 에이전트 프레임워크를 활용한 신뢰성 있고 포괄적인 교차 모달 데이터 시각화

MultiVis-Agent: A Multi-Agent Framework with Logic Rules for Reliable and Comprehensive Cross-Modal Data Visualization

Chen Zhang

Citations: 22

h-index: 3

Raymond Chi-Wing Wong

Citations: 194

h-index: 7

Jinwei Lu

Citations: 25

h-index: 4

Yuanfeng Song

Citations: 50

h-index: 4

실제 시각화 작업은 단순한 텍스트-차트 생성 이상의 복잡하고 다양한 모달 요구 사항을 포함하며, 참고 이미지, 코드 예제, 그리고 반복적인 개선 과정을 필요로 합니다. 현재 시스템은 단일 모달 입력, 일회성 생성, 그리고 경직된 워크플로우와 같은 근본적인 한계를 가지고 있습니다. LLM 기반 접근 방식은 이러한 복잡한 요구 사항에 잠재력을 보여주지만, 치명적인 오류 및 무한 루프 발생 가능성과 같은 신뢰성 문제를 야기합니다. 이러한 문제를 해결하기 위해, 우리는 신뢰성 있는 다중 모달 및 다중 시나리오 시각화 생성을 위한 논리 규칙 강화 다중 에이전트 프레임워크인 MultiVis-Agent를 제안합니다. 우리의 접근 방식은 시스템의 신뢰성에 대한 수학적 보장을 제공하면서도 유연성을 유지하는 4계층 논리 규칙 프레임워크를 도입합니다. 기존의 규칙 기반 시스템과 달리, 우리의 논리 규칙은 LLM의 추론을 대체하는 것이 아니라, LLM 추론을 안내하는 수학적 제약 조건입니다. 우리는 기본적인 생성부터 반복적인 개선까지 4가지 시나리오를 포괄하는 MultiVis 작업을 형식화하고, 다중 모달 시각화 평가를 위한 1,000건 이상의 사례를 포함하는 벤치마크인 MultiVis-Bench를 개발했습니다. 광범위한 실험 결과, 우리의 접근 방식이 어려운 작업에서 75.63%의 시각화 점수를 달성하여, 기준 모델(57.54-62.79%)을 크게 능가하며, 작업 완료율은 99.58%, 코드 실행 성공률은 94.56% (논리 규칙 미사용 시에는 각각 74.48% 및 65.10%)로, 자동화된 시각화 생성에서 복잡성과 신뢰성 문제를 동시에 해결했습니다.

Original Abstract

Real-world visualization tasks involve complex, multi-modal requirements that extend beyond simple text-to-chart generation, requiring reference images, code examples, and iterative refinement. Current systems exhibit fundamental limitations: single-modality input, one-shot generation, and rigid workflows. While LLM-based approaches show potential for these complex requirements, they introduce reliability challenges including catastrophic failures and infinite loop susceptibility. To address this gap, we propose MultiVis-Agent, a logic rule-enhanced multi-agent framework for reliable multi-modal and multi-scenario visualization generation. Our approach introduces a four-layer logic rule framework that provides mathematical guarantees for system reliability while maintaining flexibility. Unlike traditional rule-based systems, our logic rules are mathematical constraints that guide LLM reasoning rather than replacing it. We formalize the MultiVis task spanning four scenarios from basic generation to iterative refinement, and develop MultiVis-Bench, a benchmark with over 1,000 cases for multi-modal visualization evaluation. Extensive experiments demonstrate that our approach achieves 75.63% visualization score on challenging tasks, significantly outperforming baselines (57.54-62.79%), with task completion rates of 99.58% and code execution success rates of 94.56% (vs. 74.48% and 65.10% without logic rules), successfully addressing both complexity and reliability challenges in automated visualization generation.

4 Citations

0 Influential

3.5 Altmetric

21.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!