2601.06160v1 Jan 06, 2026 cs.AI

학생이 교사를 이끌다: 스펙트럼 직교 탐색을 통한 Weak-to-Strong 추론

Student Guides Teacher: Weak-to-Strong Inference via Spectral Orthogonal Exploration

Dayu Wang

Citations: 13

h-index: 2

Jiaye Yang

Citations: 7

h-index: 2

Weikang Li

Citations: 32

h-index: 4

Jiahui Liang

Citations: 12

h-index: 2

Yang Li

Citations: 13

h-index: 2

대규모 언어 모델(LLM)은 인간에 가까운 능력을 보여주지만, 복잡한 수학적 증명이나 장기 계획 작업에서는 종종 '추론 붕괴(Reasoning Collapse)'를 겪는다. 모델은 저랭크(low-rank) 편향 매니폴드(Bias Manifold)로 퇴화하는 경향이 있으며, 이 상태에서의 확률적 샘플링은 의미론적 탐색을 수행하기보다 잘못된 논리의 어휘적 변형만을 생성할 뿐이다. 이러한 기하학적 붕괴는 모델이 자신의 영공간(Null Space) 내에 존재하는 높은 가치의 해답들을 인식하지 못하게 만든다. 이를 해결하기 위해, 우리는 직관에 반하는 '학생이 교사를 이끄는(Student Guides Teacher)' 패러다임에 기반한 기하학적 프레임워크인 스펙트럼 직교 탐색(Spectral Orthogonal Exploration, SOE)을 제안한다. 구체적으로, 우리는 약한 보조 에이전트를 모방 학습의 대상이 아닌 직교 탐침(orthogonal probe)으로 활용한다. SOE는 교사 모델의 영공간을 명시적으로 탐색함으로써 기하학적 가교 역할을 수행하며, 모델을 국소 최적해(local optima)에서 효과적으로 탈출시켜 다양하고 가치 있는 해답 공간을 탐색할 수 있게 한다. 수학 벤치마크 실험 결과, 제안된 방식은 베이스라인 방법론 대비 평균 정확도를 62.4% 향상시키고 평균 샘플링 효율성을 113.7% 증가시켰으며, 이는 고난도 추론 작업의 성능 정체를 극복할 수 있는 유망한 경로임을 시사한다.

Original Abstract

While Large Language Models (LLMs) demonstrate near-human capabilities, they often suffer from "Reasoning Collapse" in complex mathematical proving and long-horizon planning. Models tend to degenerate into low-rank Bias Manifold, where stochastic sampling merely produces lexical variations of erroneous logic rather than semantic exploration. This geometric collapse renders the model "blind" to high-value solutions that lie within its Null Space. To address this, we propose Spectral Orthogonal Exploration (SOE), a geometric framework operating on a counter-intuitive "Student Guides Teacher" paradigm. Specifically, we utilize a weak auxiliary agent not for imitation, but as an orthogonal probe. By explicitly navigating the Teacher's Null Space, SOE serves as a geometric bridge, effectively ejecting the model from local optima to explore diverse, high-value solution spaces. Experiments on mathematical benchmarks demonstrate that, relative to baseline methods, our approach improves average accuracy by 62.4% and increases average sampling efficiency by 113.7%, indicating a promising path toward overcoming performance plateaus in advanced reasoning tasks.

1 Citations

0 Influential

2 Altmetric

11.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!