2601.12323v2 Jan 18, 2026 cs.AI

MARO: 사회적 상호작용을 통한 강력한 추론 능력 학습

MARO: Learning Stronger Reasoning from Social Interaction

Yin Cai

Citations: 0

h-index: 0

Ping Chen

Citations: 6

h-index: 2

Zhouhong Gu

Citations: 9

h-index: 2

Juntao Zhang

Citations: 49

h-index: 4

사람들은 일상생활에서 추론과 판단을 요구하는 수많은 상황에 직면합니다. 그러나 기존의 대규모 언어 모델 훈련 방법은 주로 기존 텍스트 콘텐츠로부터 학습하거나 미리 정의된 문제를 해결하는 방식으로 이루어지며, 다른 사람과의 상호작용, 협상 및 경쟁이 포함된 실제 시나리오에서의 경험은 부족합니다. 이러한 문제를 해결하기 위해, 본 논문에서는 대규모 언어 모델(LLM)이 다중 에이전트 사회 환경에서 학습하고 연습함으로써 더욱 강력한 추론 능력을 습득할 수 있도록 하는 방법인 멀티 에이전트 보상 최적화(MARO)를 제안합니다. 구체적으로, MARO는 첫째, 상호작용 과정 동안의 각 특정 행동을 최종 성공 또는 실패 결과로 분해하여 희소 학습 신호 문제를 해결하고, 둘째, 다양한 역할의 훈련 샘플 가중치를 균형 있게 조정하여 불균등한 역할 분배 문제를 해결하며, 셋째, 각 행동의 유용성을 직접 평가하여 환경 불안정성 문제를 해결합니다. 실험 결과는 MARO가 사회적 추론 능력에서 상당한 개선을 달성할 뿐만 아니라, 사회 시뮬레이션 학습을 통해 습득된 능력이 수학적 추론 및 지시 따르기와 같은 다른 작업에도 효과적으로 적용될 수 있음을 보여줍니다. 이는 다중 에이전트 사회 학습이 LLM의 일반적인 추론 능력을 향상시키는 데 엄청난 잠재력을 가지고 있음을 시사합니다.

Original Abstract

Humans face countless scenarios that require reasoning and judgment in daily life. However, existing large language model training methods primarily allow models to learn from existing textual content or solve predetermined problems, lacking experience in real scenarios involving interaction, negotiation, and competition with others. To address this, this paper proposes Multi-Agent Reward Optimization (MARO), a method that enables large language models (LLMs) to acquire stronger reasoning abilities by learning and practicing in multi-agent social environments. Specifically, MARO first addresses the sparse learning signal problem by decomposing final success or failure outcomes into each specific behavior during the interaction process; second, it handles the uneven role distribution problem by balancing the training sample weights of different roles; finally, it addresses environmental instability issues by directly evaluating the utility of each behavior. Experimental results demonstrate that MARO not only achieves significant improvements in social reasoning capabilities, but also that the abilities acquired through social simulation learning can effectively transfer to other tasks such as mathematical reasoning and instruction following. This reveals the tremendous potential of multi-agent social learning in enhancing the general reasoning capabilities of LLMs.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!