2601.08676v2 Jan 13, 2026 cs.AI

ESG 인텔리전스의 발전: 지속 가능 금융을 위한 전문가 수준의 에이전트 및 포괄적 벤치마크

Advancing ESG Intelligence: An Expert-level Agent and Comprehensive Benchmark for Sustainable Finance

Yilei Zhao

Citations: 328

h-index: 6

Wentao Zhang

Citations: 85

h-index: 3

Wei Yang Bryan Lim

Citations: 6

h-index: 2

Lei Xiao

Citations: 40

h-index: 5

Yandan Zheng

Citations: 66

h-index: 3

Meng Liu

Citations: 101

h-index: 6

환경, 사회, 지배구조(ESG) 기준은 기업의 지속 가능성과 윤리적 성과를 평가하는 데 필수적이다. 그러나 전문적인 ESG 분석은 비정형 소스에 흩어진 데이터의 파편화로 인해 방해받으며, 기존의 대규모 언어 모델(LLM)은 엄격한 감사가 필요한 복잡하고 다단계적인 워크플로우를 처리하는 데 종종 어려움을 겪는다. 이러한 한계를 해결하기 위해, 우리는 심층적인 ESG 분석을 생성할 수 있도록 검색 증강, 웹 검색 및 도메인 특화 기능을 포함한 전문 도구 세트로 강화된 계층적 다중 에이전트 시스템인 ESGAgent를 소개한다. 이 에이전트 시스템을 보완하기 위해, 우리는 310개의 기업 지속 가능성 보고서에서 도출된 포괄적인 3단계 벤치마크를 제시한다. 이는 단편적인 상식 질문부터 통합적이고 심층적인 분석 생성에 이르는 다양한 능력을 평가하도록 설계되었다. 실증적 평가 결과, ESGAgent는 단편적 질문 응답 작업에서 평균 84.15%의 정확도를 기록하며 최첨단 폐쇄형 LLM을 능가하였고, 풍부한 차트와 검증 가능한 참조를 통합하여 전문 보고서를 생성하는 데 탁월한 성능을 보였다. 이러한 결과는 우리 벤치마크의 진단적 가치를 확인시켜 주며, 중요한 전문 도메인(vertical domains)에서 일반적이고 진보된 에이전트 능력을 평가하기 위한 필수적인 테스트베드로 자리매김하게 한다.

Original Abstract

Environmental, social, and governance (ESG) criteria are essential for evaluating corporate sustainability and ethical performance. However, professional ESG analysis is hindered by data fragmentation across unstructured sources, and existing large language models (LLMs) often struggle with the complex, multi-step workflows required for rigorous auditing. To address these limitations, we introduce ESGAgent, a hierarchical multi-agent system empowered by a specialized toolset, including retrieval augmentation, web search and domain-specific functions, to generate in-depth ESG analysis. Complementing this agentic system, we present a comprehensive three-level benchmark derived from 310 corporate sustainability reports, designed to evaluate capabilities ranging from atomic common-sense questions to the generation of integrated, in-depth analysis. Empirical evaluations demonstrate that ESGAgent outperforms state-of-the-art closed-source LLMs with an average accuracy of 84.15% on atomic question-answering tasks, and excels in professional report generation by integrating rich charts and verifiable references. These findings confirm the diagnostic value of our benchmark, establishing it as a vital testbed for assessing general and advanced agentic capabilities in high-stakes vertical domains.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!