2603.02630v1 Mar 03, 2026 cs.LG

MASPOB: 그래프 신경망을 활용한 다중 에이전트 시스템의 밴딧 기반 프롬프트 최적화

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

Mingze Kong

Citations: 17

h-index: 3

Zhiwei Shang

Citations: 9

h-index: 2

Zhongxiang Dai

Citations: 65

h-index: 5

Zhiqing Hong

Citations: 12

h-index: 2

Jiahang Sun

Citations: 8

h-index: 2

Xiangyi Wang

Citations: 6

h-index: 1

Yao Shu

Citations: 8

h-index: 2

Qian Zhang

Citations: 2

h-index: 1

대규모 언어 모델(LLM)은 많은 실제 응용 분야에서 큰 성공을 거두었으며, 특히 다중 에이전트 시스템(MAS)의 핵심적인 인지 기능을 담당하여 실제 워크플로우를 조율하는 데 사용됩니다. 많은 배포 시나리오에서 MAS 워크플로우 수정이 불가능하고 성능이 입력 프롬프트에 매우 민감하기 때문에, 프롬프트 최적화는 성능을 향상시키는 더욱 자연스러운 접근 방식입니다. 그러나 MAS의 실제 프롬프트 최적화는 다음과 같은 세 가지 주요 문제로 인해 어려움을 겪습니다: (1) 엄청난 평가 비용으로 인해 샘플 효율성이 필요합니다. (2) 토폴로지(topology)에 의해 유발되는 프롬프트 간의 상호 연결성입니다. (3) 탐색 공간의 조합적 폭발입니다. 이러한 문제점을 해결하기 위해, 우리는 밴딧(bandits) 기반의 새로운 샘플 효율적인 프레임워크인 MASPOB(Multi-Agent System Prompt Optimization via Bandits)을 소개합니다. MASPOB은 불확실성을 정량화하기 위해 상한 신뢰 구간(Upper Confidence Bound, UCB)을 활용하여 밴딧 프레임워크가 탐색과 활용의 균형을 이루도록 하여 엄격하게 제한된 예산 내에서 최대의 이득을 얻도록 합니다. 또한, MASPOB은 토폴로지 유도 결합 문제를 해결하기 위해 그래프 신경망(GNN)을 통합하여 구조적 사전 지식을 활용하고, 프롬프트 의미론의 토폴로지 인지 표현을 학습합니다. 더욱이, MASPOB은 좌표 상승(coordinate ascent) 방식을 사용하여 최적화 문제를 단변수 부분 문제로 분해하여 탐색 복잡성을 지수 함수에서 선형 함수로 줄입니다. 다양한 벤치마크를 사용한 광범위한 실험 결과, MASPOB은 최첨단 성능을 달성하며 기존의 기준 모델보다 일관되게 우수한 성능을 보였습니다.

Original Abstract

Large Language Models (LLMs) have achieved great success in many real-world applications, especially the one serving as the cognitive backbone of Multi-Agent Systems (MAS) to orchestrate complex workflows in practice. Since many deployment scenarios preclude MAS workflow modifications and its performance is highly sensitive to the input prompts, prompt optimization emerges as a more natural approach to improve its performance. However, real-world prompt optimization for MAS is impeded by three key challenges: (1) the need of sample efficiency due to prohibitive evaluation costs, (2) topology-induced coupling among prompts, and (3) the combinatorial explosion of the search space. To address these challenges, we introduce MASPOB (Multi-Agent System Prompt Optimization via Bandits), a novel sample-efficient framework based on bandits. By leveraging Upper Confidence Bound (UCB) to quantify uncertainty, the bandit framework balances exploration and exploitation, maximizing gains within a strictly limited budget. To handle topology-induced coupling, MASPOB integrates Graph Neural Networks (GNNs) to capture structural priors, learning topology-aware representations of prompt semantics. Furthermore, it employs coordinate ascent to decompose the optimization into univariate sub-problems, reducing search complexity from exponential to linear. Extensive experiments across diverse benchmarks demonstrate that MASPOB achieves state-of-the-art performance, consistently outperforming existing baselines.

1 Citations

0 Influential

2.5 Altmetric

13.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!