2605.14355v1 May 14, 2026 cs.AI

헤르쿨레안: 금융 지능을 위한 에이전트 기반 벤치마크

Herculean: An Agentic Benchmark for Financial Intelligence

Sophia Ananiadou

Citations: 52

h-index: 3

R. Elbadry

Citations: 8

h-index: 2

Jun'ichi Tsujii

Citations: 145

h-index: 5

Yueru He

Citations: 522

h-index: 9

Xueqing Peng

Citations: 656

h-index: 13

V. Zhang

Citations: 598

h-index: 5

Yan Wang

Citations: 289

h-index: 9

Lingfei Qian

Citations: 359

h-index: 10

Jimin Huang

Citations: 249

h-index: 8

Jian-Yun Nie

Citations: 453

h-index: 11

Linhai Ma

Citations: 8

h-index: 2

Alejandro Lopez-Lira

Citations: 1,199

h-index: 13

Zhuohan Xie

Citations: 327

h-index: 9

Yuyang Dai

Citations: 10

h-index: 2

Arman Cohan

Citations: 2,541

h-index: 26

Jiahuan Pei

Citations: 105

h-index: 7

Fuyuan Lyu

McGill University

Citations: 659

h-index: 15

Qiyuan Zhang

Citations: 402

h-index: 8

Haolun Wu

Citations: 30

h-index: 3

Xiaoyu Wang

Citations: 1

h-index: 1

Ye Yuan

Citations: 27

h-index: 3

Xue Liu

Citations: 7

h-index: 2

Fengbin Zhu

National University of Singapore

Citations: 1,375

h-index: 14

Yilun Zhao

Citations: 63

h-index: 3

Yangyang Yu

Citations: 845

h-index: 10

Fengran Mo

Citations: 69

h-index: 4

Haohang Li

Citations: 842

h-index: 10

Weijin Liu

Citations: 5

h-index: 1

A. Xu

Citations: 0

h-index: 0

Yupeng Cao

Citations: 459

h-index: 10

Huan He

Citations: 20

h-index: 2

Polydoros Giannouris

Citations: 54

h-index: 4

Yuechen Jiang

Citations: 497

h-index: 7

Xuguang Ai

Citations: 10

h-index: 2

Ruoyu Xiang

Citations: 278

h-index: 6

Yikun Han

Citations: 310

h-index: 6

Shu-Yu Wang

Citations: 3

h-index: 1

Yuqing Guo

Citations: 169

h-index: 3

M. Jiang

Citations: 157

h-index: 3

You Dong

Citations: 78

h-index: 4

Yankai Chen

Citations: 10

h-index: 2

Yonghan Yang

Citations: 3

h-index: 1

Zichen Zhao

Citations: 3

h-index: 1

Fangzhao Zhang

Citations: 121

h-index: 5

Ayesha Gull

Citations: 3

h-index: 1

Muhammad Usman Safder

Citations: 9

h-index: 2

Nuo Chen

Citations: 125

h-index: 5

Tianshi Cai

Citations: 3

h-index: 1

Zimu Wang

Citations: 8

h-index: 2

Zhiwei Liu

Citations: 571

h-index: 9

M. Kabir

Citations: 7

h-index: 1

Yuyan Wang

Citations: 4

h-index: 1

Yixiang Zheng

Citations: 422

h-index: 3

Wenbo Cao

Citations: 23

h-index: 3

Pengyuan Lu

Citations: 83

h-index: 5

Jerry Huang

Citations: 71

h-index: 3

Prayag Tiwari

Citations: 7

h-index: 2

Yijiang Zhao

Citations: 17

h-index: 2

Víctor Gutiérrez Basulto

Citations: 10

h-index: 1

Xiao-Yang Liu

Citations: 362

h-index: 8

Kaleb Smith

Citations: 94

h-index: 4

Yuehua Tang

Citations: 154

h-index: 4

Xi Chen

Citations: 552

h-index: 10

Mingquan Lin

Citations: 962

h-index: 17

인공지능 에이전트의 성능이 향상됨에 따라, 중요한 질문은 더 이상 에이전트가 개별적으로 정의된 금융 작업을 해결할 수 있는지 여부가 아니라, 에이전트가 금융 전문가의 업무를 안정적으로 수행할 수 있는지 여부입니다. 기존의 금융 벤치마크는 주로 질의 응답, 정보 검색, 요약, 분류 등과 같은 정적인 역량을 평가하는 데 초점을 맞추고 있어, 에이전트의 전반적인 능력을 제대로 반영하지 못합니다. 본 논문에서는 거래, 헤징, 시장 분석, 감사 등 4가지 대표적인 워크플로우를 포괄하는 에이전트 기반 금융 지능 벤치마크인 '헤르쿨레안'을 소개합니다. 각 워크플로우는 표준화된 MCP(Modular Cognitive Process) 기반의 기술 환경으로 구현되며, 각 환경은 고유한 도구, 상호 작용 방식, 제약 조건 및 성공 기준을 갖추고 있어, 다양한 에이전트 시스템을 일관성 있게 평가할 수 있습니다. 최첨단 에이전트들을 대상으로 실험한 결과, 에이전트들은 거래 및 시장 분석 분야에서는 비교적 좋은 성능을 보였지만, 장기적인 조정, 상태 일관성 및 구조적 검증이 중요한 헤징 및 감사 분야에서는 상당한 어려움을 겪는 것으로 나타났습니다. 전반적으로, 본 연구 결과는 현재 에이전트가 금융 추론 능력을 실제 금융 워크플로우에서 신뢰성 있는 실행으로 전환하는 데 있어 중요한 격차가 존재한다는 것을 시사합니다.

Original Abstract

As AI agents improve, the central question is no longer whether they can solve isolated well-defined financial tasks, but whether they can reliably carry out financial professional work. Existing financial benchmarks offer only a partial view of this ability, as they primarily evaluate static competencies such as question answering, retrieval, summarization, and classification. We introduce Herculean, the first skilled benchmark for agentic financial intelligence spanning four representative workflows, including Trading, Hedging, Market Insights, and Auditing. Each workflow is instantiated as a standardized MCP-based skill environment with its own tools, interaction dynamics, constraints, and success criteria, enabling consistent end-to-end assessment of heterogeneous agent systems. Across frontier agents, we find agents perform relatively well on Trading and Market Insights, but struggle substantially on Hedging and Auditing, where long-horizon coordination, state consistency, and structured verification are critical. Overall, our results point to a key gap in current agents in turning financial reasoning into dependable workflow execution in high-stakes financial workflows.

0 Citations

0 Influential

13 Altmetric

65.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!