2601.21600v1 Jan 29, 2026 cs.AI

CORE: 교차 교육을 통한 협력적 추론

CORE: Collaborative Reasoning via Cross Teaching

Kshitij Mishra

Citations: 5

h-index: 1

Mirat Aubakirov

Citations: 0

h-index: 0

Martin Takác

Citations: 61

h-index: 4

Nils Lukas

Citations: 23

h-index: 3

S. Lahlou

Citations: 1,605

h-index: 9

대규모 언어 모델은 상호 보완적인 추론 오류를 보입니다. 즉, 동일한 문제에 대해 한 모델은 특정 분해 방식으로 성공하는 반면 다른 모델은 실패할 수 있습니다. 우리는 교차 교육 프로토콜을 통해 동료 모델의 성공을 학습 신호로 변환하는 학습 단계 협업 프레임워크인 CORE(Collaborative Reasoning)를 제안합니다. 각 문제는 두 단계로 해결됩니다. 첫 번째는 독립적 샘플링을 수행하는 '콜드 라운드(cold round)'이며, 그 뒤를 이어 실패한 모델이 성공한 동료로부터 추출된 힌트를 받는 '문맥 기반 구조 라운드(contexted rescue round)'가 진행됩니다. CORE는 (i) 정확성, (ii) 오류 중첩을 줄이기 위한 경량화된 DPP 기반 다양성 항, (iii) 성공적인 회복에 대한 명시적 구조 보너스의 균형을 맞춘 결합 보상을 최적화합니다. 우리는 네 가지 표준 추론 데이터셋인 GSM8K, MATH, AIME, GPQA에서 CORE를 평가했습니다. 단 1,000개의 학습 예제만으로도 소형 오픈 소스 모델 쌍(3B+4B)은 GSM8K에서 99.54%, MATH에서 92.08%의 Pass@2를 달성했습니다. 이는 단일 모델 학습 시 각각 82.50%와 74.82%인 것과 비교됩니다. 더 어려운 데이터셋의 경우, 최대 1536개의 문맥 토큰과 3072개의 생성 토큰이라는 학습 예산을 사용하여, GPQA(348개 예제로 학습)에서 77.34%, AIME(792개 예제로 학습)에서 79.65%의 Pass@2를 달성했습니다. 전반적으로 이러한 결과는 학습 단계의 협업이 모델 크기를 키우지 않고도 모델의 상호보완성을 확실한 성능 향상으로 전환할 수 있음을 보여줍니다.

Original Abstract

Large language models exhibit complementary reasoning errors: on the same instance, one model may succeed with a particular decomposition while another fails. We propose Collaborative Reasoning (CORE), a training-time collaboration framework that converts peer success into a learning signal via a cross-teaching protocol. Each problem is solved in two stages: a cold round of independent sampling, followed by a contexted rescue round in which models that failed receive hint extracted from a successful peer. CORE optimizes a combined reward that balances (i) correctness, (ii) a lightweight DPP-inspired diversity term to reduce error overlap, and (iii) an explicit rescue bonus for successful recovery. We evaluate CORE across four standard reasoning datasets GSM8K, MATH, AIME, and GPQA. With only 1,000 training examples, a pair of small open source models (3B+4B) reaches Pass@2 of 99.54% on GSM8K and 92.08% on MATH, compared to 82.50% and 74.82% for single-model training. On harder datasets, the 3B+4B pair reaches Pass@2 of 77.34% on GPQA (trained on 348 examples) and 79.65% on AIME (trained on 792 examples), using a training-time budget of at most 1536 context tokens and 3072 generated tokens. Overall, these results show that training-time collaboration can reliably convert model complementarity into large gains without scaling model size.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!