2603.01260v1 Mar 01, 2026 cs.LG

MOSAIC: 균일한 플랫폼을 통한 다양한 패러다임 비교 및 평가: 동질적 및 이질적 다중 에이전트 강화 학습, LLM, VLM, 그리고 인간 의사 결정자

MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

Abdulhamid M. Mousa

Citations: 5

h-index: 1

Yuqian Fu

Citations: 169

h-index: 8

Rakhmonberdi Khajiev

Citations: 0

h-index: 0

Jalaledin M. Azzabi

Citations: 0

h-index: 0

Abdulkarim M. Mousa

Citations: 0

h-index: 0

Peng Yang

Citations: 7

h-index: 2

Yunusa Haruna

Citations: 109

h-index: 5

Ming Liu

Citations: 41

h-index: 4

강화 학습(RL), 대규모 언어 모델(LLM), 그리고 시각-언어 모델(VLM)은 일반적으로 개별적으로 연구되어 왔습니다. 하지만 기존 인프라는 서로 다른 의사 결정 패러다임의 에이전트를 동일한 환경 내에 배치할 수 있는 기능을 제공하지 못하여, 하이브리드 다중 에이전트 환경에서 연구하거나 동일한 조건 하에서 에이전트들의 행동을 공정하게 비교하기 어렵습니다. 본 논문에서는 이러한 격차를 해소하기 위해, 다양한 기존 강화 학습 환경을 통합하고, 이질적인 에이전트(RL 정책, LLM, VLM, 그리고 인간 플레이어)가 ad-hoc 팀 환경에서 작동할 수 있도록 지원하며, 재현 가능한 결과를 제공하는 오픈 소스 플랫폼인 MOSAIC를 소개합니다. MOSAIC는 다음과 같은 세 가지 주요 기여를 제공합니다. (i) IPC 기반 워커 프로토콜: 네이티브 및 써드파티 프레임워크를 독립적인 서브프로세스 워커로 래핑하여 각 워커가 자체의 네이티브 학습 및 추론 로직을 수정 없이 실행하고, 버전 관리된 인터프로세스 프로토콜을 통해 통신합니다. (ii) 연산자 추상화: 워커를 에이전트 수준의 인터페이스로 매핑하여, RL 정책, LLM, 또는 인간에 의해 뒷받침되는 에이전트들이 모두 최소한의 통일된 인터페이스를 따르도록 합니다. (iii) 결정론적 교차 패러다임 평가 프레임워크: 두 가지 상호 보완적인 모드를 제공합니다. 첫째, 수동 모드는 N개의 연산자를 동기적으로 실행하며, 공유된 시드를 사용하여 행동 차이를 세밀하게 시각적으로 검사할 수 있습니다. 둘째, 스크립트 모드는 선언적인 파이썬 스크립트를 통해 자동화된 장기 평가를 수행하여 재현 가능한 실험을 가능하게 합니다. 우리는 MOSAIC를 오픈 소스, 시각 중심 플랫폼으로 공개하여 RL, LLM, 그리고 인간-루프 연구 커뮤니티 전반에 걸쳐 재현 가능한 교차 패러다임 연구를 촉진하고자 합니다.

Original Abstract

Reinforcement learning (RL), large language models (LLMs), and vision-language models (VLMs) have been widely studied in isolation. However, existing infrastructure lacks the ability to deploy agents from different decision-making paradigms within the same environment, making it difficult to study them in hybrid multi-agent settings or to compare their behaviour fairly under identical conditions. We present MOSAIC, an open-source platform that bridges this gap by incorporating a diverse set of existing reinforcement learning environments and enabling heterogeneous agents (RL policies, LLMs, VLMs, and human players) to operate within them in ad-hoc team settings with reproducible results. MOSAIC introduces three contributions. (i) An IPC-based worker protocol that wraps both native and third-party frameworks as isolated subprocess workers, each executing its native training and inference logic unmodified, communicating through a versioned inter-process protocol. (ii) An operator abstraction that forms an agent-level interface by mapping workers to agents: each operator, regardless of whether it is backed by an RL policy, an LLM, or a human, conforms to a minimal unified interface. (iii) A deterministic cross-paradigm evaluation framework offering two complementary modes: a manual mode that advances up to N concurrent operators in lock-step under shared seeds for fine-grained visual inspection of behavioural differences, and a script mode that drives automated, long-running evaluation through declarative Python scripts, for reproducible experiments. We release MOSAIC as an open, visual-first platform to facilitate reproducible cross-paradigm research across the RL, LLM, and human-in-the-loop communities.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!