2603.21522v1 Mar 23, 2026 cs.SE

추론 추적 표현을 활용한 다중 에이전트 시스템의 효율적인 오류 관리

Efficient Failure Management for Multi-Agent Systems with Reasoning Trace Representation

Gong Zhang

Citations: 64

h-index: 4

Lingzhe Zhang

Citations: 73

h-index: 5

Tong Jia

Citations: 432

h-index: 11

Chiming Duan

Citations: 291

h-index: 10

Minghua He

Citations: 125

h-index: 7

Ying Li

Citations: 483

h-index: 12

Weijie Hong

Citations: 48

h-index: 4

Mingyuan Wang

Citations: 44

h-index: 4

Rongqian Wang

Citations: 8

h-index: 2

Xi Peng

Citations: 12

h-index: 3

Meiling Wang

Citations: 47

h-index: 3

Renhai Chen

Citations: 67

h-index: 4

대규모 언어 모델(LLM) 기반 다중 에이전트 시스템(MAS)은 소프트웨어 시스템 설계의 새로운 패러다임으로 부상했으며, 강력한 추론 및 협업 능력을 보여주고 있습니다. 이러한 시스템이 더욱 복잡하고 자율적으로 발전함에 따라, 신뢰성과 가용성을 보장하기 위해서는 효과적인 오류 관리가 필수적입니다. 그러나 기존의 접근 방식은 종종 개별 추론에 의존하여 효율성이 낮고, 과거의 오류 패턴을 간과하여 진단 정확도를 제한하는 경향이 있습니다. 본 논문에서는 다중 에이전트 시스템의 오류 관리를 개선하기 위해 과거 오류 패턴을 활용하는 것의 필요성, 잠재력 및 과제를 보여주는 예비적인 경험적 연구를 수행합니다. 이러한 통찰력을 바탕으로, 추론 추적 표현을 기반으로 하는 다중 에이전트 시스템의 효율적인 오류 관리 프레임워크인 **EAGER**를 제안합니다. EAGER는 비지도 추론 범위 대비 학습을 활용하여 개별 에이전트의 추론과 에이전트 간의 협조를 모두 인코딩하여, 과거 오류 지식을 기반으로 실시간으로 단계별 오류 탐지, 진단 및 능동적인 완화를 가능하게 합니다. 세 가지 오픈 소스 MAS에 대한 예비 평가 결과는 EAGER의 효과를 입증하고, 신뢰할 수 있는 다중 에이전트 시스템 운영에 대한 미래 연구의 유망한 방향을 제시합니다.

Original Abstract

Large Language Models (LLM)-based Multi-Agent Systems (MASs) have emerged as a new paradigm in software system design, increasingly demonstrating strong reasoning and collaboration capabilities. As these systems become more complex and autonomous, effective failure management is essential to ensure reliability and availability. However, existing approaches often rely on per-trace reasoning, which leads to low efficiency, and neglect historical failure patterns, limiting diagnostic accuracy. In this paper, we conduct a preliminary empirical study to demonstrate the necessity, potential, and challenges of leveraging historical failure patterns to enhance failure management in MASs. Building on this insight, we propose \textbf{EAGER}, an efficient failure management framework for multi-agent systems based on reasoning trace representation. EAGER employs unsupervised reasoning-scoped contrastive learning to encode both intra-agent reasoning and inter-agent coordination, enabling real-time step-wise failure detection, diagnosis, and reflexive mitigation guided by historical failure knowledge. Preliminary evaluations on three open-source MASs demonstrate the effectiveness of EAGER and highlight promising directions for future research in reliable multi-agent system operations.

4 Citations

0 Influential

6 Altmetric

34.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!