2604.03157v1 Apr 03, 2026 cs.AI

Chart-RL: 시각적 추론 능력 향상을 위한 정책 최적화 강화 학습 - 시각 언어 모델 기반 차트 질의 응답

Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models

Amit Dhanda

Citations: 0

h-index: 0

Shekhar Jain

Citations: 12

h-index: 2

Yunfei Bai

Citations: 80

h-index: 4

최근 시각 언어 모델(VLM)의 발전은 진정한 지능에 대한 가능성을 보여주며, 견고한 추론 능력이 필수적입니다. 패턴 인식 외에도, 언어적 추론은 특히 복잡한 데이터 시각화를 포함하는 차트 질의 응답(CQA) 작업에서 시각적 이해와 통합되어야 합니다. 현재 VLM은 CQA에서 부정확한 수치 추출, 암시적 시각적 관계 해석의 어려움, 그리고 차트 내 공간적 관계를 포착하기 위한 부적절한 주의 메커니즘과 같은 중요한 한계를 가지고 있습니다. 본 연구에서는 이러한 문제점을 해결하기 위해, 시각적 인식과 논리적 추론의 피드백 기반 정책 최적화를 통해 VLM의 차트 이해 능력을 향상시키는 새로운 강화 학습 프레임워크인 Chart-RL을 제안합니다. 저희의 핵심적인 혁신은 정책 최적화 기술을 활용한 강화 학습(RL)과 적응형 보상 함수를 통합한 포괄적인 프레임워크이며, 이는 기존 기초 모델과 비교하여 우수한 성능을 보이고, 최첨단 아키텍처와 경쟁력 있는 결과를 보여줍니다. 또한, 저희는 강화 학습 프레임워크에 파라미터 효율적인 파인튜닝 기술인 LoRA(Low-Rank Adaptation)를 통합하여, 단일 GPU 환경에서도 성능 저하 없이 작동하도록 했습니다. 저희는 ChartQAPro 데이터셋을 사용하여 오픈 소스, 독점, 그리고 최첨단 폐쇄 소스 모델에 대한 광범위한 벤치마킹을 수행했습니다. 강화 학습으로 파인튜닝된 Qwen3-VL-4B-Instruct 모델은 0.634의 답변 정확도를 달성하여, 동일한 파라미터 수의 절반을 사용하면서도 Qwen3-VL-8B-Instruct 기초 모델의 0.580의 정확도를 능가했으며, 동시에 추론 지연 시간을 31초에서 9초로 줄였습니다.

Original Abstract

The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension, particularly for Chart Question Answering (CQA) tasks involving complex data visualizations. Current VLMs face significant limitations in CQA, including imprecise numerical extraction, difficulty interpreting implicit visual relationships, and inadequate attention mechanisms for capturing spatial relationships in charts. In this work, we address these challenges by presenting Chart-RL, a novel reinforcement learning framework that enhances VLMs chart understanding through feedback-driven policy optimization of visual perception and logical inference. Our key innovation includes a comprehensive framework integrating Reinforcement Learning (RL) from Policy Optimization techniques along with adaptive reward functions, that demonstrates superior performance compared to baseline foundation models and competitive results against larger state-of-the-art architectures. We also integrated Parameter-Efficient Fine-Tuning through Low-Rank Adaptation (LoRA) in the RL framework that only requires single GPU configurations while preserving performance integrity. We conducted extensive benchmarking across open-source, proprietary, and state-of-the-art closed-source models utilizing the ChartQAPro dataset. The RL fine-tuned Qwen3-VL-4B-Instruct model achieved an answer accuracy of 0.634, surpassing the 0.580 accuracy of the Qwen3-VL-8B-Instruct foundation model despite utilizing half the parameter count, while simultaneously reducing inference latency from 31 seconds to 9 seconds.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!