2604.11741v1 Apr 13, 2026 cs.AI

살인 미스터리 게임에서의 불완전 정보 추론 향상을 위한 협력적 멀티 에이전트 스크립트 생성

Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games

Hefeng Wu

Citations: 212

h-index: 7

Keyang Zhong

Citations: 84

h-index: 4

Haofeng Li

Citations: 1,199

h-index: 16

Junlin Xie

Citations: 129

h-index: 4

Guanbin Li

Citations: 3,906

h-index: 33

비전-언어 모델(VLMs)은 시각적 인식 작업에서 뛰어난 성능을 보였지만, 불완전하고 기만적인 정보가 존재하는 멀티 플레이어 게임 환경에서 복잡한 다단계 추론 능력은 저하되는 경향이 있습니다. 본 논문에서는 대표적인 멀티 플레이어 게임인 살인 미스터리 게임을 연구합니다. 이 게임은 각 역할이 다른 의도를 가지고 제공하는 부분적인 단서를 바탕으로 숨겨진 진실을 추론해야 합니다. 이러한 문제를 해결하기 위해, 우리는 고품질의 역할 기반 멀티 플레이어 게임 스크립트를 평가하고 생성하기 위한 협력적 멀티 에이전트 프레임워크를 제안합니다. 이 프레임워크는 캐릭터 정체성에 맞춰 세밀하게 조정된 상호 작용 패턴을 가능하게 합니다 (예: 살인범 vs. 무고한 사람). 우리의 시스템은 조화로운 에이전트 상호 작용을 통해 캐릭터 배경, 시각적 및 텍스트 단서, 그리고 다단계 추론 체인을 포함하는 풍부한 멀티모달 컨텍스트를 생성합니다. 우리는 VLMs의 추론 능력을 향상시키기 위한 두 단계의 에이전트 모니터링 기반 학습 전략을 설계했습니다. (1) 불확실성과 기만을 모델링한 선별된 데이터셋과 합성 데이터셋을 활용한 체인-오브-생트 기반 파인튜닝; (2) 에이전트 모니터링 기반의 GRPO 강화 학습을 통해 모델이 캐릭터별 추론 행동을 개발하고 효과적인 멀티모달 다단계 추론을 수행하도록 장려합니다. 광범위한 실험 결과, 제안하는 방법이 내러티브 추론, 숨겨진 사실 추출, 그리고 기만 방지적 이해 능력에서 VLMs의 성능을 크게 향상시키는 것을 보여줍니다. 본 연구는 불확실하고 적대적이며 사회적으로 복잡한 조건에서 VLMs를 학습하고 평가하기 위한 확장 가능한 솔루션을 제공하며, 불완전 정보 하에서의 멀티모달 다단계 추론 분야의 향후 벤치마크 개발의 기초를 마련합니다.

Original Abstract

Vision-language models (VLMs) have shown impressive capabilities in perceptual tasks, yet they degrade in complex multi-hop reasoning under multiplayer game settings with imperfect and deceptive information. In this paper, we study a representative multiplayer task, Murder Mystery Games, which require inferring hidden truths based on partial clues provided by roles with different intentions. To address this challenge, we propose a collaborative multi-agent framework for evaluating and synthesizing high-quality, role-driven multiplayer game scripts, enabling fine-grained interaction patterns tailored to character identities (i.e., murderer vs. innocent). Our system generates rich multimodal contexts, including character backstories, visual and textual clues, and multi-hop reasoning chains, through coordinated agent interactions. We design a two-stage agent-monitored training strategy to enhance the reasoning ability of VLMs: (1) chain-of-thought based fine-tuning on curated and synthetic datasets that model uncertainty and deception; (2) GRPO-based reinforcement learning with agent-monitored reward shaping, encouraging the model to develop character-specific reasoning behaviors and effective multimodal multi-hop inference. Extensive experiments demonstrate that our method significantly boosts the performance of VLMs in narrative reasoning, hidden fact extraction, and deception-resilient understanding. Our contributions offer a scalable solution for training and evaluating VLMs under uncertain, adversarial, and socially complex conditions, laying the groundwork for future benchmarks in multimodal multi-hop reasoning under imperfect information.

4 Citations

0 Influential

16.5 Altmetric

86.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!