2601.21403v1 Jan 29, 2026 cs.AI

DataCross: 교차 모달 이기종 데이터 분석을 위한 통합 벤치마크 및 에이전트 프레임워크

DataCross: A Unified Benchmark and Agent Framework for Cross-Modal Heterogeneous Data Analysis

Zhou Liu

Citations: 55

h-index: 4

Wentao Zhang

Citations: 20

h-index: 3

Ruyi Qi

Citations: 5

h-index: 1

실제 데이터 과학 및 기업 의사 결정 과정에서 중요한 정보는 직접 쿼리 가능한 구조화된 소스(예: SQL, CSV)와 비정형 시각 문서(예: 스캔된 보고서, 송장 이미지) 내에 갇힌 "좀비 데이터"에 파편화되어 있는 경우가 많습니다. 기존의 데이터 분석 에이전트는 주로 구조화된 데이터를 처리하는 데 국한되어 있어, 이러한 고가치 시각 정보를 활성화하고 연관 짓지 못하며, 결과적으로 산업계의 요구와 큰 격차를 보이고 있습니다. 이러한 격차를 해소하기 위해, 우리는 이기종 데이터 양식 전반에 걸친 통합적이고 통찰력 중심의 분석을 위한 새로운 벤치마크이자 협업 에이전트 프레임워크인 DataCross를 소개합니다. DataCrossBench는 금융, 의료 및 기타 도메인에 걸친 200개의 엔드투엔드 분석 작업으로 구성됩니다. 이는 인간이 개입하는 역합성 파이프라인을 통해 구축되어, 현실적인 복잡성, 소스 간 의존성, 검증 가능한 정답(ground truth)을 보장합니다. 이 벤치마크는 시각적 표 추출, 교차 모달 정렬 및 다단계 결합 추론에 대한 에이전트의 능력을 평가하기 위해 작업을 세 가지 난이도 단계로 분류합니다. 또한 우리는 인간 분석가의 "분할 정복" 워크플로우에서 영감을 받은 DataCrossAgent 프레임워크를 제안합니다. 이 프레임워크는 특정 데이터 소스에 대한 전문가인 특화된 하위 에이전트를 활용하며, 이들은 '소스 내 심층 탐색', '핵심 소스 식별', '맥락적 상호 교차'라는 구조화된 워크플로우를 통해 조정됩니다. 새로운 reReAct 메커니즘은 사실 검증을 위한 강력한 코드 생성 및 디버깅을 가능하게 합니다. 실험 결과에 따르면 DataCrossAgent는 GPT-4o에 비해 사실성 면에서 29.7% 향상된 성능을 달성했으며, 고난이도 작업에서 탁월한 견고성을 보여주어, 파편화된 "좀비 데이터"를 효과적으로 활성화하여 통찰력 있는 교차 모달 분석을 수행함을 입증했습니다.

Original Abstract

In real-world data science and enterprise decision-making, critical information is often fragmented across directly queryable structured sources (e.g., SQL, CSV) and "zombie data" locked in unstructured visual documents (e.g., scanned reports, invoice images). Existing data analytics agents are predominantly limited to processing structured data, failing to activate and correlate this high-value visual information, thus creating a significant gap with industrial needs. To bridge this gap, we introduce DataCross, a novel benchmark and collaborative agent framework for unified, insight-driven analysis across heterogeneous data modalities. DataCrossBench comprises 200 end-to-end analysis tasks across finance, healthcare, and other domains. It is constructed via a human-in-the-loop reverse-synthesis pipeline, ensuring realistic complexity, cross-source dependency, and verifiable ground truth. The benchmark categorizes tasks into three difficulty tiers to evaluate agents' capabilities in visual table extraction, cross-modal alignment, and multi-step joint reasoning. We also propose the DataCrossAgent framework, inspired by the "divide-and-conquer" workflow of human analysts. It employs specialized sub-agents, each an expert on a specific data source, which are coordinated via a structured workflow of Intra-source Deep Exploration, Key Source Identification, and Contextual Cross-pollination. A novel reReAct mechanism enables robust code generation and debugging for factual verification. Experimental results show that DataCrossAgent achieves a 29.7% improvement in factuality over GPT-4o and exhibits superior robustness on high-difficulty tasks, effectively activating fragmented "zombie data" for insightful, cross-modal analysis.

4 Citations

0 Influential

2 Altmetric

14.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!