2604.10973v1 Apr 13, 2026 cs.AI

CFMS: 향상된 표 기반 추론을 위한 거친-세밀 다중 모드 합성 프레임워크

CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning

Yiding Sun

Citations: 65

h-index: 5

Dongxu Zhang

Citations: 76

h-index: 6

Hongqiang Lin

Citations: 24

h-index: 2

Qixian Huang

Citations: 11

h-index: 2

Yingsen Wang

Citations: 7

h-index: 1

Qirui Wang

Citations: 92

h-index: 3

Tongxi Fu

Citations: 5

h-index: 1

Zhen-Xin Fu

Citations: 3

h-index: 1

표 기반 데이터에 대한 추론은 질문 응답 및 사실 검증과 같은 작업에 있어 매우 중요한 능력이며, 이는 모델이 자유 형식의 질문과 반정형 테이블을 모두 이해해야 하기 때문입니다. 체인 오브 소트(Chain-of-Thought, CoT)와 같은 방법은 추론 체인을 도입하지만, 순수하게 기호적인 방법은 전체적인 시각적 패턴을 파악하지 못한다는 근본적인 한계가 있습니다. 이러한 문제를 해결하기 위해, 우리는 고수준의 시각적 인식과 세부적인 기호적 추론을 계층적으로 분리하는 새로운 2단계 패러다임인 거친-세밀 다중 모드 합성 프레임워크(Coarse-to-Fine Multimodal Synthesis, CFMS)를 제안합니다. 거친 단계에서 CFMS는 다중 모드 대규모 언어 모델(Multimodal Large Language Models, MLLMs)을 활용하여 다각적인 지식 튜플을 한 번에 합성합니다. 이 튜플은 이후 세밀 단계에서 기호 엔진이 테이블에 대해 효율적인 일련의 반복적인 작업을 수행하도록 안내하는 동적 추론 지도로 사용됩니다. WikiTQ 및 TabFact 벤치마크에 대한 광범위한 실험 결과, CFMS는 경쟁력 있는 정확도를 달성하는 것으로 나타났습니다. 또한, CFMS는 대규모 테이블을 처리하고 더 작은 기본 모델을 사용할 때 특히 강력한 성능을 보여주며, 이는 그 효과성과 일반화 가능성을 입증합니다.

Original Abstract

Reasoning over tabular data is a crucial capability for tasks like question answering and fact verification, as it requires models to comprehend both free-form questions and semi-structured tables. However, while methods like Chain-of-Thought (CoT) introduce reasoning chains, purely symbolic methodes are inherently limited by their blindness to holistic visual patterns. To address this, we propose the Coarse-to-Fine Multimodal Synthesis framework (CFMS), a novel two-stage paradigm that hierarchically decouples high-level visual perception from granular symbolic reasoning. In the Coarse Stage, CFMS leverages the Multimodal Large Language Models (MLLMs) to perform a one-time synthesis of a multi-perspective knowledge tuple. This tuple subsequently serves as a dynamic reasoning map to guide the fine stage, where a symbolic engine executes a targeted and efficient sequence of iterative operations over the table. Extensive experiments on the WikiTQ and TabFact benchmarks demonstrate that CFMS achieves competitive accuracy. The framework exhibits particular robustness when handling large tables and when instantiated with smaller backbone models, validating its effectiveness and generalizability.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!