2601.13761v2 Jan 20, 2026 cs.AI

DARC: LLM 진화를 위한 분리된 비대칭 추론 교육 과정

DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution

Yankai Lin

Citations: 146

h-index: 7

Shengda Fan

Citations: 49

h-index: 2

Xuyan Ye

Citations: 0

h-index: 0

대규모 언어 모델(LLM)을 활용한 자기 학습은 자기 개선 인공 지능을 달성하는 유망한 패러다임으로 부상했습니다. 그러나 기존의 자기 학습 프레임워크는 종종 최적화 불안정성을 겪는데, 이는 (i) 질문자의 보상 피드백에 따른 솔버 의존적인 비정상적인 목표 및 (ii) 솔버를 지도하기 위해 사용되는 자체 생성된 가짜 레이블로 인한 부트스트래핑 오류 때문입니다. 이러한 문제점을 해결하기 위해, 우리는 자기 진화 과정을 안정화하는 두 단계 프레임워크인 DARC(분리된 비대칭 추론 교육 과정)를 제안합니다. 먼저, 명시적인 난이도 수준과 외부 코퍼스를 기반으로 난이도가 조절된 질문을 생성하도록 질문자를 학습시킵니다. 둘째, 문서 증강된 교사가 학생 솔버를 지도하기 위해 고품질의 가짜 레이블을 생성하는 비대칭 자기 증류 메커니즘을 사용하여 솔버를 학습시킵니다. 실험 결과는 DARC가 모델에 독립적이며, 9개의 추론 벤치마크와 3개의 기반 모델에서 평균 10.9점의 성능 향상을 보여준다는 것을 입증합니다. 또한, DARC는 모든 기준 모델보다 일관되게 우수한 성능을 보이며, 인간 주석 없이 완전 지도 모델의 성능에 근접합니다. 코드는 https://github.com/RUCBM/DARC 에서 확인할 수 있습니다.

Original Abstract

Self-play with large language models has emerged as a promising paradigm for achieving self-improving artificial intelligence. However, existing self-play frameworks often suffer from optimization instability, due to (i) non-stationary objectives induced by solver-dependent reward feedback for the Questioner, and (ii) bootstrapping errors from self-generated pseudo-labels used to supervise the Solver. To mitigate these challenges, we introduce DARC (Decoupled Asymmetric Reasoning Curriculum), a two-stage framework that stabilizes the self-evolution process. First, we train the Questioner to synthesize difficulty-calibrated questions, conditioned on explicit difficulty levels and external corpora. Second, we train the Solver with an asymmetric self-distillation mechanism, where a document-augmented teacher generates high-quality pseudo-labels to supervise the student Solver that lacks document access. Empirical results demonstrate that DARC is model-agnostic, yielding an average improvement of 10.9 points across nine reasoning benchmarks and three backbone models. Moreover, DARC consistently outperforms all baselines and approaches the performance of fully supervised models without relying on human annotations. The code is available at https://github.com/RUCBM/DARC.

0 Citations

0 Influential

34.486122886681 Altmetric

172.4 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!