2602.07342v1 Feb 07, 2026 cs.AI

SupChain-Bench: 실제 공급망 관리를 위한 대규모 언어 모델 벤치마킹

SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management

Lang Cao

Citations: 93

h-index: 5

Sheng Guan

Citations: 125

h-index: 4

Yihao Liu

Citations: 327

h-index: 9

대규모 언어 모델(LLM)은 복잡한 추론과 도구 기반 의사 결정에서 가능성을 보여주었으며, 이는 실제 공급망 관리에 이를 적용하려는 동기가 되고 있다. 그러나 공급망 워크플로우는 도메인 특화 절차에 기반한 신뢰할 수 있는 장기적(long-horizon)이고 다단계적인 조율을 필요로 하며, 이는 현재 모델들에게 여전히 어려운 과제이다. 이러한 환경에서 LLM의 성능을 체계적으로 평가하기 위해, 우리는 공급망 도메인 지식과 표준 운영 절차(SOP)에 기반한 장기적 도구 기반 조율 능력을 모두 평가하는 통합 실세계 벤치마크인 SupChain-Bench를 소개한다. 실험 결과, 모델 간 실행 신뢰성에 상당한 격차가 있음이 밝혀졌다. 더 나아가 우리는 도구 사용을 위한 실행 가능한 절차를 자율적으로 합성하는 SOP-free 프레임워크인 SupChain-ReAct를 제안하며, 이는 가장 강력하고 일관된 도구 호출 성능을 달성했다. 본 연구는 실제 운영 환경에서 신뢰할 수 있는 장기 조율을 연구하기 위한 원칙적인 벤치마크를 수립하고, LLM 기반 공급망 에이전트에 있어 상당한 개선의 여지가 있음을 시사한다.

Original Abstract

Large language models (LLMs) have shown promise in complex reasoning and tool-based decision making, motivating their application to real-world supply chain management. However, supply chain workflows require reliable long-horizon, multi-step orchestration grounded in domain-specific procedures, which remains challenging for current models. To systematically evaluate LLM performance in this setting, we introduce SupChain-Bench, a unified real-world benchmark that assesses both supply chain domain knowledge and long-horizon tool-based orchestration grounded in standard operating procedures (SOPs). Our experiments reveal substantial gaps in execution reliability across models. We further propose SupChain-ReAct, an SOP-free framework that autonomously synthesizes executable procedures for tool use, achieving the strongest and most consistent tool-calling performance. Our work establishes a principled benchmark for studying reliable long-horizon orchestration in real-world operational settings and highlights significant room for improvement in LLM-based supply chain agents.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!