2602.00085v1 Jan 22, 2026 cs.LG

CARE-RFT: 신뢰 기반 강화 미세 조정: 대규모 언어 모델의 안정적인 추론을 위한 방법

CARE-RFT: Confidence-Anchored Reinforcement Finetuning for Reliable Reasoning in Large Language Models

Aryan Mokhtari

Citations: 18

h-index: 2

Shuozhe Li

Citations: 24

h-index: 3

Leqi Liu

Citations: 1

h-index: 1

Jincheng Cao

Citations: 32

h-index: 3

Bo Hu

Citations: 90

h-index: 5

Amy Zhang

Citations: 3

h-index: 1

강화 미세 조정(RFT)은 대규모 언어 모델의 추론 능력을 향상시키는 강력한 방법론으로 부상했습니다. 그러나, 우리는 중요한 상충 관계를 발견했습니다. 제약 없는 RFT는 뛰어난 추론 성능을 달성하지만, 환각 현상을 심화시키고 모델의 신뢰성을 저해하며, 교정 능력을 악화시킵니다. 반대로, RKL 제약 하에 수행되는 RFT는 신뢰성을 유지하지만, 탐색 과정에서의 벗어남에 대한 무한한 페널티로 인해 추론 능력 향상에 제한이 있습니다. 이러한 긴장을 해소하기 위해, 우리는 표준 역방향 KL 정규화를 비대칭 역방향 KL 발산으로 대체하는 새로운 방법인 CARE-RFT(Confidence-Anchored Regularized Reinforcement Finetuning)를 제안합니다. CARE-RFT는 신뢰도에 민감한 페널티를 제공합니다. 즉, 추론 능력을 활성화하기 위해 신뢰도가 높고 일관되게 보상을 받는 탐색에는 제한적인 페널티를 적용하고, 그렇지 않은 경우에는 무한한 페널티를 적용하여 교정 능력을 유지합니다. 다양한 모델 크기 및 RFT 알고리즘에 대한 광범위한 실험 결과, CARE-RFT는 우수한 균형을 제공하며, 제약 없는 RFT와 동등한 추론 성능을 달성하는 동시에 기본 모델의 신뢰성과 교정 능력을 회복한다는 것을 보여줍니다. 우리의 연구는 신중하고 신뢰도에 대한 인식을 가진 정규화가, 능력이 뛰어나고 신뢰할 수 있는 추론 모델을 구축하는 데 핵심적이라는 것을 입증합니다.

Original Abstract

Reinforcement finetuning (RFT) has emerged as a powerful paradigm for unlocking reasoning capabilities in large language models. However, we identify a critical trade-off: while unconstrained RFT achieves strong reasoning performance, it severely compromises model trustworthiness by amplifying hallucination and worsening calibration; conversely, RKL-constrained RFT preserves trustworthiness but limits reasoning gains due to its unbounded penalty on exploratory deviations. To resolve this tension, we introduce CARE-RFT (Confidence-Anchored Regularized Reinforcement Finetuning), a novel method that replaces standard reverse KL regularization with a skew reverse KL divergence. CARE-RFT provides a confidence-sensitive penalty: it is bounded for confident, consistently rewarded explorations to enable reasoning, while unbounded elsewhere to preserve calibration. Extensive experiments across multiple model scales and RFT algorithms show that CARE-RFT achieves a superior balance, matching the reasoning performance of unconstrained RFT while recovering the trustworthiness and calibration of the base model. Our work establishes that careful, confidence-aware regularization is key to building both capable and trustworthy reasoning models.

1 Citations

0 Influential

2.5 Altmetric

13.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!