2604.03393v1 Apr 03, 2026 cs.AI

TABQAWORLD: 다중 턴 테이블 질문 응답을 위한 다중 모드 추론 최적화

TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering

Tung Sum Thomas Kwok

Citations: 30

h-index: 2

Chunhe Wang

Citations: 2

h-index: 1

Xiaofeng Lin

Citations: 76

h-index: 4

Peng Lu

Citations: 3

h-index: 1

Nan Tang

Citations: 410

h-index: 11

Xinyu Wang

Citations: 64

h-index: 4

Changlun Li

Citations: 24

h-index: 3

Hanwei Wu

Citations: 0

h-index: 0

Elisa Kreiss

Citations: 35

h-index: 3

Guang Cheng

Citations: 16

h-index: 2

다중 모드 추론은 추론 모델의 추론 능력을 향상시키는 강력한 프레임워크로 부상했습니다. 다중 턴 테이블 추론 방법은 도구 사용 및 보상 모델링을 통해 추론 정확도를 향상시켰지만, 테이블 상태 정보를 읽어오는 데 고정된 텍스트 직렬화 방식을 사용합니다. 이는 테이블 인코딩에서 표현 오류를 발생시키며, 이러한 오류는 여러 턴 과정에서 크게 누적됩니다. 이러한 누적 현상은 테이블 기반 접지 방법을 통해 완화될 수 있지만, 추론 계산 비용과 비용이 증가하여 실제 환경에 적용하기 어렵습니다. 이러한 문제를 해결하기 위해, 우리는 표현과 추정 모두를 통해 테이블 액션을 공동으로 최적화하는 테이블 추론 프레임워크인 TABQAWORLD를 소개합니다. 표현 측면에서, TABQAWORLD는 액션에 조건화된 다중 모드 선택 정책을 사용하며, 이를 통해 테이블 상태 정보의 신뢰성을 극대화하기 위해 시각적 표현과 텍스트 표현을 동적으로 전환합니다. 추정 측면에서, TABQAWORLD는 차원, 데이터 유형 및 키 값과 같은 테이블 메타데이터를 활용하여 단계별 추론 경로를 최적화하고, 안전하게 경로를 계획하며, 저 복잡도 액션을 사용하여 대화 턴 수와 지연 시간을 줄입니다. 학습이 필요 없는 프레임워크로 설계된 TABQAWORLD는 실험적 평가를 통해 기준 모델 대비 4.87%의 정확도 향상을 달성했으며, 정적 설정 대비 5.42%의 정확도 향상과 33.35%의 추론 지연 시간 감소를 통해 안정적이고 효율적인 테이블 추론의 새로운 기준을 제시합니다.

Original Abstract

Multimodal reasoning has emerged as a powerful framework for enhancing reasoning capabilities of reasoning models. While multi-turn table reasoning methods have improved reasoning accuracy through tool use and reward modeling, they rely on fixed text serialization for table state readouts. This introduces representation errors in table encoding that significantly accumulate over multiple turns. Such accumulation is alleviated by tabular grounding methods in the expense of inference compute and cost, rendering real world deployment impractical. To address this, we introduce TABQAWORLD, a table reasoning framework that jointly optimizes tabular action through representation and estimation. For representation, TABQAWORLD employs an action-conditioned multimodal selection policy, which dynamically switches between visual and textual representations to maximize table state readout reliability. For estimation, TABQAWORLD optimizes stepwise reasoning trajectory through table metadata including dimension, data types and key values, safely planning trajectory and compressing low-complexity actions to reduce conversation turns and latency. Designed as a training-free framework, empirical evaluations show that TABQAWORLD achieves state-of-the-art performance with 4.87% accuracy improvements over baselines, with 5.42% accuracy gain and 33.35% inference latency reduction over static settings, establishing a new standard for reliable and efficient table reasoning.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!