2604.17503v1 Apr 19, 2026 cs.AI

SkillGraph: 다중 모드 그래프 토폴로지를 활용한 자가 진화형 다중 에이전트 협업 시스템

SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology

Bo Yin

Citations: 11

h-index: 2

Xinle Yu

Citations: 58

h-index: 3

Xiaobin Hu

Citations: 32

h-index: 3

Jiang-She Zhang

Citations: 179

h-index: 8

Zheng Nie

Citations: 84

h-index: 5

Ruolin Shen

Citations: 3

h-index: 1

시각-언어 모델을 시각 다중 에이전트 시스템(VMAS)으로 확장하는 데에는 두 가지 주요 문제가 복합적으로 작용합니다. 첫째, 추론 전에 통신 토폴로지가 고정되어 시각적 콘텐츠와 쿼리 컨텍스트에 대한 정보를 반영하지 못합니다. 둘째, 에이전트의 추론 능력은 배포 과정에서 정적 상태로 유지됩니다. 이러한 문제들은 서로 강화되어, 경직된 토폴로지는 풍부한 에이전트 전문성을 활용하지 못하고, 정적인 에이전트는 특정 쿼리에 대한 전문화를 위한 동기를 부여받지 못합니다. 우리는 SkillGraph라는 통합 프레임워크를 통해 에이전트의 전문성과 통신 토폴로지의 두 가지 측면 모두를 발전시킵니다. 이 프레임워크 내에서, 다중 모드 그래프 트랜스포머(MMGT)는 시각적 토큰, 명령어 의미론, 그리고 활성 스킬 임베딩을 인코딩하여, 쿼리에 따라 달라지는 협업 그래프를 예측합니다. 이를 통해 수동으로 설계된 라우팅 방식을 동적이고 콘텐츠에 기반한 정보 흐름으로 대체합니다. 또한, Skill Designer는 실패 사례로부터 추론 규칙을 추출하고 개선하여, 자체적으로 진화하는 다중 모드 스킬 저장소를 구축합니다. 중요한 점은, 업데이트된 스킬 임베딩이 MMGT에 다시 입력되어, 에이전트의 역량 향상과 함께 토폴로지가 적응하도록 한다는 것입니다. 실험 결과, SkillGraph는 네 가지 벤치마크, 다섯 가지 일반적인 MAS 구조, 그리고 네 가지 기본 모델에서 일관된 성능 향상을 보였습니다. 관련 코드는 https://github.com/niez233/skillgraph 에서 확인할 수 있습니다.

Original Abstract

Scaling vision-language models into Visual Multiagent Systems (VMAS) is hindered by two coupled issues. First, communication topologies are fixed before inference, leaving them blind to visual content and query context; second, agent reasoning abilities remain static during deployment. These issues reinforce each other: a rigid topology fails to leverage richer agent expertise, while static agents lack incentives to specialize for a given query. We address this with SkillGraph, a joint framework that evolves both agent expertise and communication topology. Within this framework, a Multimodal Graph Transformer (MMGT) encodes visual tokens, instruction semantics and active skill embeddings to predict a query-conditioned collaboration graph, replacing hand-crafted routing with dynamic, content-aware information flow. Complementing this, a Skill Designer distills and refines reasoning heuristics from failure cases, constructing a self-evolving multimodal Skill Bank. Crucially, updated skill embeddings are fed back into the MMGT, enabling the topology to adapt alongside capability growth. Experiments show that SkillGraph achieves consistent improvements across four benchmarks, five common MAS structures and four base models. Code is available at https://github.com/niez233/skillgraph.

1 Citations

0 Influential

27.4657359028 Altmetric

138.3 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!