2601.13752v1 Jan 20, 2026 cs.AI

RELIEF: 신념 공학을 통한 추론 감독 없이 추론 행동을 형성하는 방법

Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering

Dingwei Chen

Citations: 35

h-index: 3

Jian Wang

Citations: 273

h-index: 9

Chak Tou Leong

Citations: 564

h-index: 11

Heming Xia

Citations: 357

h-index: 6

Qingyu Yin

Citations: 223

h-index: 5

Wenjie Li

Citations: 115

h-index: 6

Sunbowen Lee

Citations: 29

h-index: 3

대규모 추론 모델(LRM)은 복잡한 문제 해결에서 놀라운 성공을 거두었지만, 종종 계산적 중복이나 추론의 정확성 부족 문제를 겪습니다. 현재 LRM의 행동을 형성하는 방법은 일반적으로 강화 학습 또는 표준화된 추론 과정을 활용한 미세 조정에 의존하며, 이는 계산 비용이 많이 들고 확장하기 어렵습니다. 본 논문에서는 LRM이 자체 추론 과정을 내부적으로 추적하는 잠재적인 extit{추론 신념}을 가지고 있으며, 간단한 로짓 탐색을 통해 이를 파악할 수 있다는 사실을 밝힙니다. 이러한 통찰력을 바탕으로, 우리는 추론 신념 공학(RELIEF)이라는 간단하면서도 효과적인 프레임워크를 제안합니다. RELIEF는 모델의 자기 인식과 목표 신념 청사진을 일치시켜 LRM의 행동을 형성합니다. 중요한 점은 RELIEF는 추론 과정에 대한 감독이 전혀 필요하지 않습니다. RELIEF는 합성된, 자기 성찰적인 질문-답변 쌍을 사용하여 목표 신념을 강화하는 미세 조정을 통해 원하는 특성을 내재화합니다. 효율성과 정확성 관련 다양한 실험에서 RELIEF는 행동 감독 및 선호도 기반 방법과 동등하거나 더 나은 성능을 보이며, 더 낮은 학습 비용이 필요함을 입증했습니다. 추가 분석을 통해 모델의 추론 신념을 변경하면 실제 행동에 효과적으로 영향을 미침을 확인했습니다.

Original Abstract

Large reasoning models (LRMs) have achieved remarkable success in complex problem-solving, yet they often suffer from computational redundancy or reasoning unfaithfulness. Current methods for shaping LRM behavior typically rely on reinforcement learning or fine-tuning with gold-standard reasoning traces, a paradigm that is both computationally expensive and difficult to scale. In this paper, we reveal that LRMs possess latent \textit{reasoning beliefs} that internally track their own reasoning traits, which can be captured through simple logit probing. Building upon this insight, we propose Reasoning Belief Engineering (RELIEF), a simple yet effective framework that shapes LRM behavior by aligning the model's self-concept with a target belief blueprint. Crucially, RELIEF completely bypasses the need for reasoning-trace supervision. It internalizes desired traits by fine-tuning on synthesized, self-reflective question-answering pairs that affirm the target belief. Extensive experiments on efficiency and faithfulness tasks demonstrate that RELIEF matches or outperforms behavior-supervised and preference-based baselines while requiring lower training costs. Further analysis validates that shifting a model's reasoning belief effectively shapes its actual behavior.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!