2603.24943v1 Mar 26, 2026 cs.AI

FinMCP-Bench: 모델 컨텍스트 프로토콜 하에서 실제 금융 도구 사용을 위한 LLM 에이전트 성능 벤치마킹

FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

Yong Liu

Citations: 5

h-index: 1

Jie Zhu

Citations: 159

h-index: 4

Yimin Tian

Citations: 8

h-index: 2

Boyan Li

Citations: 541

h-index: 10

Ke Wu

Citations: 2

h-index: 1

Zhongzhi Liang

Citations: 1

h-index: 1

Junhui Li

Citations: 51

h-index: 3

Xianyin Zhang

Citations: 90

h-index: 3

Lifan Guo

Citations: 58

h-index: 4

Feng Chen

Citations: 32

h-index: 3

Chi Zhang

Citations: 18

h-index: 3

본 논문에서는 대규모 언어 모델(LLM)이 금융 모델 컨텍스트 프로토콜을 활용하여 실제 금융 문제를 해결하는 능력을 평가하기 위한 새로운 벤치마크인 **FinMCP-Bench**를 소개합니다. FinMCP-Bench는 10개의 주요 시나리오와 33개의 하위 시나리오에 걸쳐 613개의 샘플로 구성되어 있으며, 다양성과 진정성을 확보하기 위해 실제 및 합성 사용자 쿼리를 모두 포함합니다. 이 벤치마크는 65개의 실제 금융 MCP(Model Context Protocol)와 단일 도구, 다중 도구, 다중 턴의 세 가지 유형의 샘플을 통합하여 모델이 다양한 수준의 작업 복잡성에 대해 평가될 수 있도록 합니다. 본 벤치마크를 사용하여 다양한 주류 LLM을 체계적으로 평가하고, 도구 사용 정확도와 추론 능력을 명시적으로 측정하는 지표를 제안합니다. FinMCP-Bench는 금융 LLM 에이전트 연구 발전을 위한 표준화되고, 실용적이며, 도전적인 테스트 환경을 제공합니다.

Original Abstract

This paper introduces \textbf{FinMCP-Bench}, a novel benchmark for evaluating large language models (LLMs) in solving real-world financial problems through tool invocation of financial model context protocols. FinMCP-Bench contains 613 samples spanning 10 main scenarios and 33 sub-scenarios, featuring both real and synthetic user queries to ensure diversity and authenticity. It incorporates 65 real financial MCPs and three types of samples, single tool, multi-tool, and multi-turn, allowing evaluation of models across different levels of task complexity. Using this benchmark, we systematically assess a range of mainstream LLMs and propose metrics that explicitly measure tool invocation accuracy and reasoning capabilities. FinMCP-Bench provides a standardized, practical, and challenging testbed for advancing research on financial LLM agents.

1 Citations

0 Influential

5 Altmetric

26.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!