2602.10886v1 Feb 11, 2026 cs.CL

CLEF-2026 FinMMEval 실험실: 다국어 및 다중 모드 기반 금융 AI 시스템 평가

The CLEF-2026 FinMMEval Lab: Multilingual and Multimodal Evaluation of Financial AI Systems

Preslav Nakov

Citations: 8,470

h-index: 49

R. Elbadry

Citations: 8

h-index: 2

D. Dimitrov

Citations: 199

h-index: 8

Xueqing Peng

Citations: 656

h-index: 13

Lingfei Qian

Citations: 359

h-index: 10

Jimin Huang

Citations: 249

h-index: 8

Zhuohan Xie

Citations: 327

h-index: 9

Fan Zhang

Citations: 3

h-index: 1

Georgi N. Georgiev

Citations: 178

h-index: 5

Vanshikaa Jani

Citations: 2

h-index: 1

Yuyang Dai

Citations: 10

h-index: 2

Jiahui Geng

Citations: 826

h-index: 13

Yuxia Wang

Citations: 28

h-index: 3

Ivan Koychev

Citations: 2,575

h-index: 26

Veselin Stoyanov

Citations: 21

h-index: 3

본 논문에서는 CLEF 2026에서 개최되는 FinMMEval 실험실의 구성과 과제를 소개합니다. FinMMEval은 금융 분야의 대규모 언어 모델(LLM)을 위한 최초의 다국어 및 다중 모드 평가 프레임워크를 제공합니다. 최근 금융 자연어 처리 기술의 발전으로 시장 보고서, 규제 문서, 투자자 커뮤니케이션 등의 자동 분석이 가능해졌지만, 기존의 벤치마크는 대부분 단일 언어, 텍스트 기반이며, 제한적인 하위 작업에 국한되어 있습니다. FinMMEval 2026은 이러한 격차를 해소하기 위해 금융 이해, 추론, 의사 결정을 포괄하는 세 가지 상호 연결된 과제를 제시합니다. 구체적으로, 금융 시험 문제 답변, 다국어 금융 질문 답변(PolyFiQA), 금융 의사 결정 과제가 있습니다. 이러한 과제들을 통해, 모델이 다양한 언어 및 모드에서 추론하고, 일반화하고, 행동하는 능력을 종합적으로 평가할 수 있습니다. 본 실험실은 견고하고 투명하며, 전 세계적으로 포용적인 금융 AI 시스템 개발을 촉진하는 것을 목표로 하며, 데이터셋과 평가 리소스를 공개하여 재현 가능한 연구를 지원합니다.

Original Abstract

We present the setup and the tasks of the FinMMEval Lab at CLEF 2026, which introduces the first multilingual and multimodal evaluation framework for financial Large Language Models (LLMs). While recent advances in financial natural language processing have enabled automated analysis of market reports, regulatory documents, and investor communications, existing benchmarks remain largely monolingual, text-only, and limited to narrow subtasks. FinMMEval 2026 addresses this gap by offering three interconnected tasks that span financial understanding, reasoning, and decision-making: Financial Exam Question Answering, Multilingual Financial Question Answering (PolyFiQA), and Financial Decision Making. Together, these tasks provide a comprehensive evaluation suite that measures models' ability to reason, generalize, and act across diverse languages and modalities. The lab aims to promote the development of robust, transparent, and globally inclusive financial AI systems, with datasets and evaluation resources publicly released to support reproducible research.

2 Citations

0 Influential

24.5 Altmetric

124.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!