2601.22162v1 Jan 09, 2026 q-fin.GN

UniFinEval: 텍스트, 이미지, 비디오를 아우르는 금융 다중 모드 모델의 통합 평가를 향하여

UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images and Videos

Zhi Yang

Citations: 16

h-index: 2

Xin Guo

Citations: 150

h-index: 3

Rongjunchen Zhang

Citations: 86

h-index: 5

Liwen Zhang

Citations: 11

h-index: 2

Fangqi Lou

Citations: 91

h-index: 3

Zhaowei Liu

Citations: 174

h-index: 5

Zhenxiong Yu

Citations: 2

h-index: 1

Zhiheng Jin

Citations: 2

h-index: 1

Lingfeng Zeng

Citations: 83

h-index: 3

Qi Qi

Citations: 25

h-index: 3

Wei Zhang

Citations: 22

h-index: 3

Zhenyu Wu

Citations: 66

h-index: 4

Jun Han

Citations: 33

h-index: 2

Lejie Zhang

Citations: 7

h-index: 1

Xiaoming Huang

Citations: 3

h-index: 1

Xiaolong Liang

Citations: 3

h-index: 1

Zheng Wei

Citations: 24

h-index: 2

Junbo Zou

Citations: 41

h-index: 4

Dongpo Cheng

Citations: 9

h-index: 2

다중 모드 대규모 언어 모델(MLLM)은 금융 분야에서 점점 더 중요한 역할을 수행하고 있지만, 이들이 직면하는 과제, 즉 다중 모드 및 고밀도 정보 처리, 그리고 모드 간의 복합적인 추론 능력은 기존의 다중 모드 벤치마크의 평가 범위를 벗어납니다. 이러한 격차를 해소하기 위해, 우리는 고밀도 정보 환경에서 텍스트, 이미지, 비디오를 포괄하는 최초의 통합 다중 모드 벤치마크인 UniFinEval을 제안합니다. UniFinEval은 실제 금융 시스템을 기반으로 하는 다섯 가지 핵심 금융 시나리오를 체계적으로 구성합니다: 재무제표 감사, 기업 기본 정보 추론, 산업 동향 분석, 금융 위험 감지, 그리고 자산 배분 분석. 우리는 중국어 및 영어로 구성된 3,767개의 고품질 질의응답 데이터 세트를 수동으로 구축하고, 10개의 주요 MLLM을 Zero-Shot 및 CoT 환경에서 체계적으로 평가했습니다. 결과는 Gemini-3-pro-preview가 가장 우수한 전반적인 성능을 보였지만, 여전히 금융 전문가 수준에 상당한 격차가 있음을 보여줍니다. 추가적인 오류 분석 결과, 현재 모델에는 체계적인 결함이 존재하는 것으로 나타났습니다. UniFinEval은 정교하고 고밀도 정보 환경에서 MLLM의 능력을 체계적으로 평가하여, 실제 금융 시나리오에서의 MLLM 응용 분야의 견고성을 향상시키는 것을 목표로 합니다. 데이터 및 코드는 https://github.com/aifinlab/UniFinEval에서 확인할 수 있습니다.

Original Abstract

Multimodal large language models are playing an increasingly significant role in empowering the financial domain, however, the challenges they face, such as multimodal and high-density information and cross-modal multi-hop reasoning, go beyond the evaluation scope of existing multimodal benchmarks. To address this gap, we propose UniFinEval, the first unified multimodal benchmark designed for high-information-density financial environments, covering text, images, and videos. UniFinEval systematically constructs five core financial scenarios grounded in real-world financial systems: Financial Statement Auditing, Company Fundamental Reasoning, Industry Trend Insights, Financial Risk Sensing, and Asset Allocation Analysis. We manually construct a high-quality dataset consisting of 3,767 question-answer pairs in both chinese and english and systematically evaluate 10 mainstream MLLMs under Zero-Shot and CoT settings. Results show that Gemini-3-pro-preview achieves the best overall performance, yet still exhibits a substantial gap compared to financial experts. Further error analysis reveals systematic deficiencies in current models. UniFinEval aims to provide a systematic assessment of MLLMs' capabilities in fine-grained, high-information-density financial environments, thereby enhancing the robustness of MLLMs applications in real-world financial scenarios. Data and code are available at https://github.com/aifinlab/UniFinEval.

1 Citations

0 Influential

27.993061443341 Altmetric

141.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!