2603.05120v1 Mar 05, 2026 cs.AI

양방향 교육 과정 생성: 데이터 효율적인 수학적 추론을 위한 다중 에이전트 프레임워크

Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning

Xiao Liu

Citations: 74

h-index: 2

Boci Peng

Citations: 558

h-index: 4

Xinping Zhao

Citations: 178

h-index: 6

Xiaoran Shang

Citations: 38

h-index: 3

Yun Zhu

Citations: 33

h-index: 3

Lijun Wu

Citations: 24

h-index: 2

Boren Hu

Citations: 43

h-index: 3

대규모 언어 모델에서 수학적 추론 능력을 향상시키기 위해서는 일반적으로 방대한 데이터셋이 필요하지만, 데이터 효율성은 여전히 중요한 과제입니다. 교육 과정 학습(Curriculum Learning)은 이러한 과정을 구조화하려는 시도이지만, 기존의 단방향 접근 방식(단순-복잡)은 비효율적인 샘플 활용 문제를 안고 있습니다. 이러한 방식은 기초적인 이해 부족에도 불구하고 무작정 난이도를 높여, 풀리지 않는 문제에 대한 불필요한 연산을 수행하게 됩니다. 본 연구에서는 각 훈련 샘플의 교육적 가치를 극대화하기 위해 새로운 양방향 교육 과정 생성 프레임워크를 제안합니다. 기존의 경직된 경로와 달리, 본 연구의 다중 에이전트 시스템은 적응적인 교육 방법을 모방하여 폐쇄적인 피드백 루프를 구축합니다. 이 시스템은 모델의 성능을 향상시키기 위해 문제를 복잡하게 만들거나, 중요한 점은 특정 추론 오류를 수정하기 위해 문제를 단순화하는 방식으로 데이터를 동적으로 생성합니다. 이러한 메커니즘을 통해 모델은 특정 단계에서 가장 효과적인 데이터만을 소비하도록 설계되었습니다. 최적 페이싱 정리(Optimal Pacing Theorem)에 기반하여, 본 연구의 접근 방식은 학습 경로를 최적화하며, 기존 방법보다 훨씬 뛰어난 성능을 보이고, 훨씬 적은 수의 훈련 샘플로 우수한 추론 능력을 달성합니다.

Original Abstract

Enhancing mathematical reasoning in Large Language Models typically demands massive datasets, yet data efficiency remains a critical bottleneck. While Curriculum Learning attempts to structure this process, standard unidirectional approaches (simple-to-complex) suffer from inefficient sample utilization: they blindly escalate complexity even when foundational gaps persist, leading to wasted computation on unsolvable problems. To maximize the instructional value of every training sample, we introduce a novel Bidirectional Curriculum Generation framework. Unlike rigid trajectories, our multi-agent ecosystem mimics adaptive pedagogy to establish a closed feedback loop. It dynamically generates data by either complicating problems to challenge the model or, crucially, simplying them to repair specific reasoning failures. This mechanism ensures that the model consumes only the most effective data at any given stage. Grounded in the Optimal Pacing Theorem, our approach optimizes the learning trajectory, significantly outperforming baselines while achieving superior reasoning performance with substantially fewer instruction samples.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!