2601.16276v1 Jan 22, 2026 cs.CL

GameTalk: 전략적 대화를 위한 LLM 훈련

GameTalk: Training LLMs for Strategic Conversation

M. Schaar

Citations: 28,140

h-index: 76

Victor Conchello Vendrell

Citations: 2

h-index: 1

Max Ruiz Luyten

Citations: 122

h-index: 7

대규모 언어 모델(LLM)의 핵심 과제 중 하나는 다중 에이전트 환경에서 전략적 의사 결정을 내리는 것입니다. 특히, 조정과 협상이 장기간의 대화를 통해 이루어져야 할 때 더욱 그렇습니다. 최근 연구에서는 LLM을 개별적인 의사 결정 작업에 활용하는 연구가 진행되었지만, 대화를 통해 장기적인 목표를 최적화하는 데 대한 연구는 상대적으로 부족합니다. 본 논문에서는 LLM이 다중 턴 상호 작용을 통해 전략적 의사 결정을 내리도록 훈련하는 프레임워크인 **GameTalk**를 소개합니다. 기존 연구가 단일 턴 목표 또는 정적인 행동 예측에 초점을 맞춘 것과 달리, 우리는 LLM을 전체 대화 과정을 통해 전역 목표를 최적화하도록 훈련합니다. 이를 위해 GRPO, DPO, STaR과 같은 파인튜닝 방법을 수정하여 전체 상호 작용에 의존하는 보상 신호를 통합합니다. 우리는 다양한 추론, 조정 및 상대방 모델링 능력을 평가하기 위해 설계된 일련의 복잡한 게임에서 이 접근 방식을 평가했습니다. 그 결과, GameTalk는 보상 형성을 통해 특히 효과적으로, 훈련되지 않은 모델보다 훨씬 뛰어난 성능을 보였으며, DPO는 일관되게 가장 큰 성능 향상을 보였습니다. 이러한 결과는 대화형 파인튜닝이 LLM이 상호 작용 환경에서 추론, 협상 및 행동을 수행하는 데 유망한 방법임을 시사합니다.

Original Abstract

Strategic decision-making in multi-agent settings is a key challenge for large language models (LLMs), particularly when coordination and negotiation must unfold over extended conversations. While recent work has explored the use of LLMs in isolated decision tasks, little attention has been given to optimizing long-term objectives through dialogue. We introduce \textbf{GameTalk}, a framework for training LLMs to make strategic decisions via multi-turn interactions. Unlike prior work that focuses on single-turn objectives or static action prediction, we train LLMs to optimize a global objective across full conversations. We achieve this by adapting fine-tuning methods like GRPO, DPO, and STaR to incorporate reward signals that depend on the entire interaction. We evaluate this approach on a suite of increasingly complex games, designed to stress different aspects of reasoning, coordination, and opponent modeling. Our results show that GameTalk significantly outperforms untrained models, especially under reward shaping, with DPO consistently yielding the strongest gains. These findings position conversational fine-tuning as a promising path for LLMs to reason, negotiate, and act in interactive environments.

0 Citations

0 Influential

30 Altmetric

150.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!