2605.14355v1 May 14, 2026 cs.AI

헤르쿨레안: 금융 지능을 위한 에이전트 기반 벤치마크

Herculean: An Agentic Benchmark for Financial Intelligence

Sophia Ananiadou
Sophia Ananiadou
Citations: 52
h-index: 3
R. Elbadry
R. Elbadry
Citations: 8
h-index: 2
Jun'ichi Tsujii
Jun'ichi Tsujii
Citations: 145
h-index: 5
Yueru He
Yueru He
Citations: 522
h-index: 9
Xueqing Peng
Xueqing Peng
Citations: 656
h-index: 13
V. Zhang
V. Zhang
Citations: 598
h-index: 5
Yan Wang
Yan Wang
Citations: 289
h-index: 9
Lingfei Qian
Lingfei Qian
Citations: 359
h-index: 10
Jimin Huang
Jimin Huang
Citations: 249
h-index: 8
Jian-Yun Nie
Jian-Yun Nie
Citations: 453
h-index: 11
Linhai Ma
Linhai Ma
Citations: 8
h-index: 2
Alejandro Lopez-Lira
Alejandro Lopez-Lira
Citations: 1,199
h-index: 13
Zhuohan Xie
Zhuohan Xie
Citations: 327
h-index: 9
Yuyang Dai
Yuyang Dai
Citations: 10
h-index: 2
Arman Cohan
Arman Cohan
Citations: 2,541
h-index: 26
Jiahuan Pei
Jiahuan Pei
Citations: 105
h-index: 7
Fuyuan Lyu
Fuyuan Lyu
McGill University
Citations: 659
h-index: 15
Qiyuan Zhang
Qiyuan Zhang
Citations: 402
h-index: 8
Haolun Wu
Haolun Wu
Citations: 30
h-index: 3
Xiaoyu Wang
Xiaoyu Wang
Citations: 1
h-index: 1
Ye Yuan
Ye Yuan
Citations: 27
h-index: 3
Xue Liu
Xue Liu
Citations: 7
h-index: 2
Fengbin Zhu
Fengbin Zhu
National University of Singapore
Citations: 1,375
h-index: 14
Yilun Zhao
Yilun Zhao
Citations: 63
h-index: 3
Yangyang Yu
Yangyang Yu
Citations: 845
h-index: 10
Fengran Mo
Fengran Mo
Citations: 69
h-index: 4
Haohang Li
Haohang Li
Citations: 842
h-index: 10
Weijin Liu
Weijin Liu
Citations: 5
h-index: 1
A. Xu
A. Xu
Citations: 0
h-index: 0
Yupeng Cao
Yupeng Cao
Citations: 459
h-index: 10
Huan He
Huan He
Citations: 20
h-index: 2
Polydoros Giannouris
Polydoros Giannouris
Citations: 54
h-index: 4
Yuechen Jiang
Yuechen Jiang
Citations: 497
h-index: 7
Xuguang Ai
Xuguang Ai
Citations: 10
h-index: 2
Ruoyu Xiang
Ruoyu Xiang
Citations: 278
h-index: 6
Yikun Han
Yikun Han
Citations: 310
h-index: 6
Shu-Yu Wang
Shu-Yu Wang
Citations: 3
h-index: 1
Yuqing Guo
Yuqing Guo
Citations: 169
h-index: 3
M. Jiang
M. Jiang
Citations: 157
h-index: 3
You Dong
You Dong
Citations: 78
h-index: 4
Yankai Chen
Yankai Chen
Citations: 10
h-index: 2
Yonghan Yang
Yonghan Yang
Citations: 3
h-index: 1
Zichen Zhao
Zichen Zhao
Citations: 3
h-index: 1
Fangzhao Zhang
Fangzhao Zhang
Citations: 121
h-index: 5
Ayesha Gull
Ayesha Gull
Citations: 3
h-index: 1
Muhammad Usman Safder
Muhammad Usman Safder
Citations: 9
h-index: 2
Nuo Chen
Nuo Chen
Citations: 125
h-index: 5
Tianshi Cai
Tianshi Cai
Citations: 3
h-index: 1
Zimu Wang
Zimu Wang
Citations: 8
h-index: 2
Zhiwei Liu
Zhiwei Liu
Citations: 571
h-index: 9
M. Kabir
M. Kabir
Citations: 7
h-index: 1
Yuyan Wang
Yuyan Wang
Citations: 4
h-index: 1
Yixiang Zheng
Yixiang Zheng
Citations: 422
h-index: 3
Wenbo Cao
Wenbo Cao
Citations: 23
h-index: 3
Pengyuan Lu
Pengyuan Lu
Citations: 83
h-index: 5
Jerry Huang
Jerry Huang
Citations: 71
h-index: 3
Prayag Tiwari
Prayag Tiwari
Citations: 7
h-index: 2
Yijiang Zhao
Yijiang Zhao
Citations: 17
h-index: 2
Víctor Gutiérrez Basulto
Víctor Gutiérrez Basulto
Citations: 10
h-index: 1
Xiao-Yang Liu
Xiao-Yang Liu
Citations: 362
h-index: 8
Kaleb Smith
Kaleb Smith
Citations: 94
h-index: 4
Yuehua Tang
Yuehua Tang
Citations: 154
h-index: 4
Xi Chen
Xi Chen
Citations: 552
h-index: 10
Mingquan Lin
Mingquan Lin
Citations: 962
h-index: 17

인공지능 에이전트의 성능이 향상됨에 따라, 중요한 질문은 더 이상 에이전트가 개별적으로 정의된 금융 작업을 해결할 수 있는지 여부가 아니라, 에이전트가 금융 전문가의 업무를 안정적으로 수행할 수 있는지 여부입니다. 기존의 금융 벤치마크는 주로 질의 응답, 정보 검색, 요약, 분류 등과 같은 정적인 역량을 평가하는 데 초점을 맞추고 있어, 에이전트의 전반적인 능력을 제대로 반영하지 못합니다. 본 논문에서는 거래, 헤징, 시장 분석, 감사 등 4가지 대표적인 워크플로우를 포괄하는 에이전트 기반 금융 지능 벤치마크인 '헤르쿨레안'을 소개합니다. 각 워크플로우는 표준화된 MCP(Modular Cognitive Process) 기반의 기술 환경으로 구현되며, 각 환경은 고유한 도구, 상호 작용 방식, 제약 조건 및 성공 기준을 갖추고 있어, 다양한 에이전트 시스템을 일관성 있게 평가할 수 있습니다. 최첨단 에이전트들을 대상으로 실험한 결과, 에이전트들은 거래 및 시장 분석 분야에서는 비교적 좋은 성능을 보였지만, 장기적인 조정, 상태 일관성 및 구조적 검증이 중요한 헤징 및 감사 분야에서는 상당한 어려움을 겪는 것으로 나타났습니다. 전반적으로, 본 연구 결과는 현재 에이전트가 금융 추론 능력을 실제 금융 워크플로우에서 신뢰성 있는 실행으로 전환하는 데 있어 중요한 격차가 존재한다는 것을 시사합니다.

Original Abstract

As AI agents improve, the central question is no longer whether they can solve isolated well-defined financial tasks, but whether they can reliably carry out financial professional work. Existing financial benchmarks offer only a partial view of this ability, as they primarily evaluate static competencies such as question answering, retrieval, summarization, and classification. We introduce Herculean, the first skilled benchmark for agentic financial intelligence spanning four representative workflows, including Trading, Hedging, Market Insights, and Auditing. Each workflow is instantiated as a standardized MCP-based skill environment with its own tools, interaction dynamics, constraints, and success criteria, enabling consistent end-to-end assessment of heterogeneous agent systems. Across frontier agents, we find agents perform relatively well on Trading and Market Insights, but struggle substantially on Hedging and Auditing, where long-horizon coordination, state consistency, and structured verification are critical. Overall, our results point to a key gap in current agents in turning financial reasoning into dependable workflow execution in high-stakes financial workflows.

0 Citations
0 Influential
13 Altmetric
65.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!