2604.16158v1 Apr 17, 2026 cs.CL

AtManRL: 미분 가능한 어텐션 기반의 가중치 강조를 통한 신뢰성 있는 추론

AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency

Kristian Kersting

Citations: 83

h-index: 5

Max Henning Hoth

Citations: 0

h-index: 0

Bjorn Deiseroth

Citations: 799

h-index: 9

Letitia Parcalabescu

Citations: 0

h-index: 0

대규모 언어 모델(LLM)은 복잡한 작업을 해결하기 위해 점진적 추론(Chain-of-Thought, CoT)에 점점 더 의존하고 있습니다. 그러나 추론 과정이 모델의 최종 답변에 기여하고 이를 충실하게 반영하는지 확인하는 것이 여전히 어려운 과제입니다. 본 논문에서는 미분 가능한 어텐션 조작을 활용하여 강화 학습을 통해 더욱 신뢰성 있는 추론을 학습하는 방법인 AtManRL을 소개합니다. 올바른 답변을 생성하는 데 중요한 CoT 토큰을 식별하는 가중치 어텐션 마스크를 학습시켜, 모델이 최종 예측에 실제로 영향을 미치는 추론 과정을 생성하도록 유도하는 가중치 보상 신호를 도출합니다. 이 가중치 보상을 GRPO 프레임워크 내의 결과 기반 보상과 함께 사용하여 정확성과 해석 가능성을 동시에 최적화합니다. GSM8K 및 MMLU 데이터셋에서 Llama-3.2-3B-Instruct 모델을 사용하여 실험한 결과, AtManRL은 중요한 추론 토큰을 식별하고 더 투명한 추론 모델을 학습하는 데 효과적임을 입증했습니다.

Original Abstract

Large language models (LLMs) increasingly rely on chain-of-thought (CoT) reasoning to solve complex tasks. Yet ensuring that the reasoning trace both contributes to and faithfully reflects the processes underlying the model's final answer, rather than merely accompanying it, remains challenging. We introduce AtManRL, a method that leverages differentiable attention manipulation to learn more faithful reasoning through reinforcement learning. By training an additive attention mask that identifies tokens in the CoT crucial for producing correct answers, we derive a saliency reward signal that encourages the model to generate reasoning traces that genuinely influence its final predictions. We integrate this saliency reward with outcome-based rewards within the GRPO framework to jointly optimize for correctness and interpretability. Experiments on GSM8K and MMLU with Llama-3.2-3B-Instruct demonstrate that our approach can identify influential reasoning tokens and enable training more transparent reasoning models.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!