2601.04237v1 Jan 04, 2026 cs.AI

SAGE-32B: 반복적 증류를 통한 에이전트 추론

SAGE-32B: Agentic Reasoning via Iterative Distillation

Basab Jha

Citations: 2

h-index: 1

Firoj Paudel

Citations: 0

h-index: 0

Ujjwal Puri

Citations: 12

h-index: 2

Ethan Henkel

Citations: 0

h-index: 0

Yuting Zhang

Citations: 0

h-index: 0

Mei Huang

Citations: 0

h-index: 0

M. Kowalczyk

Citations: 681

h-index: 10

Donghyuk Choi

Citations: 0

h-index: 0

Junhao Wang

Citations: 97

h-index: 3

우리는 에이전트 추론 및 장기 계획 작업에 중점을 둔 320억 파라미터 언어 모델인 SAGE-32B를 소개한다. 일반적인 대화 유창성을 목표로 하는 채팅 모델과 달리, SAGE-32B는 작업 분해, 도구 사용, 오류 복구를 강조하며 에이전트 루프 내에서 작동하도록 설계되었다. 이 모델은 Qwen2.5-32B 사전 학습 모델에서 초기화되었으며, 엄격하게 테스트된 피드백 루프를 통해 추론 성능을 향상시키는 2단계 훈련 과정인 '반복적 증류(Iterative Distillation)'를 사용하여 미세 조정되었다. 또한 SAGE-32B는 메타 인지 헤드를 사용하여 실행 전 계획 단계에서 잠재적인 실패를 예측하는 역추론(inverse reasoning) 접근 방식을 도입했다. MMLU-Pro, AgentBench, MATH-500을 포함한 에이전트 추론 벤치마크에서 SAGE-32B는 비슷한 규모의 기준 모델에 비해 다중 도구 사용 시나리오에서 더 높은 성공률을 달성했으며, 표준 추론 평가에서도 여전히 경쟁력을 보였다. 모델 가중치는 https://huggingface.co/sagea-ai/sage-reasoning-32b 에 공개되어 있다.

Original Abstract

We demonstrate SAGE-32B, a 32 billion parameter language model that focuses on agentic reasoning and long range planning tasks. Unlike chat models that aim for general conversation fluency, SAGE-32B is designed to operate in an agentic loop, emphasizing task decomposition, tool usage, and error recovery. The model is initialized from the Qwen2.5-32B pretrained model and fine tuned using Iterative Distillation, a two stage training process that improves reasoning performance through rigorously tested feedback loops. SAGE-32B also introduces an inverse reasoning approach, which uses a meta cognition head to forecast potential failures in the planning process before execution. On agentic reasoning benchmarks including MMLU-Pro, AgentBench, and MATH-500, SAGE-32B achieves higher success rates in multi tool usage scenarios compared to similarly sized baseline models, while remaining competitive on standard reasoning evaluations. Model weights are publicly released at https://huggingface.co/sagea-ai/sage-reasoning-32b

0 Citations

0 Influential

25 Altmetric

125.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!