2604.27251v1 Apr 29, 2026 cs.CL

규정 준수 vs. 상식: 대규모 언어 모델의 추론 제어 가능성에 대한 연구

Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

Yuxiang Zhou

Citations: 108

h-index: 6

Mahmud Elahi Akhter

North South university

Citations: 59

h-index: 5

Xingwei Tan

Citations: 29

h-index: 4

Nikolaos Aletras

Citations: 28

h-index: 3

Marco Valentino

Idiap Research Institute

Citations: 722

h-index: 15

Maria Liakata

Citations: 12

h-index: 3

대규모 언어 모델(LLM)은 사전 훈련 데이터에서 공유되는 추론 패턴을 통해 추론 능력을 습득하며, 체인 오브 소트(Chain-of-Thought, CoT) 방식을 통해 이러한 능력이 더욱 발현됩니다. 그러나 귀납, 연역, 추론과 같은 기본적인 추론 패턴이 특정 문제 인스턴스와 분리될 수 있는지 여부는 모델 제어 가능성을 높이는 데 있어 중요한 과제입니다. 본 논문에서는 추론 충돌이라는 관점에서 이 문제를 체계적으로 조사합니다. 추론 충돌은 특정 작업에 예상되는 논리적 구조와 다른 논리적 구조를 강제함으로써 발생하는 매개변수 정보와 맥락 정보 간의 명시적인 긴장을 의미합니다. 우리의 평가 결과, LLM은 일관되게 규정 준수보다 상식을 우선시하며, 충돌하는 지시에도 불구하고 작업에 적합한 추론 패턴을 선호하는 것으로 나타났습니다. 주목할 점은 작업 정확도가 상식에 의해 엄격하게 결정되지 않으며, 모델은 충돌하는 패턴을 사용하더라도 높은 성능을 유지하는 경우가 많습니다. 이는 모델 크기가 증가함에 따라 강화되는 내재화된 매개변수 메모리에 의존한다는 것을 시사합니다. 또한, 추론 충돌이 내부적으로 감지될 수 있다는 것을 보여주었습니다. 신뢰도 점수가 충돌하는 경우에 크게 감소하는 것을 통해 이를 확인할 수 있습니다. 추가적인 실험을 통해 추론 유형이 중간층에서 후기층까지 선형적으로 인코딩된다는 것을 확인했으며, 이는 활성화 수준에서의 제어가 가능하다는 것을 시사합니다. 이러한 통찰력을 바탕으로 모델을 규정 준수하도록 유도하여 지시 사항 준수율을 최대 29%까지 향상시켰습니다. 전반적으로, 본 연구 결과는 LLM의 추론이 구체적인 인스턴스에 묶여 있지만, 적극적인 메커니즘적 개입을 통해 논리적 구조를 데이터로부터 효과적으로 분리하여 제어 가능성, 신뢰성 및 일반화 능력을 향상시킬 수 있는 방법을 제시합니다.

Original Abstract

Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific problem instances remains a critical challenge for model controllability, and for shedding light on reasoning controllability. In this paper, we present the first systematic investigation of this problem through the lens of reasoning conflicts: an explicit tension between parametric and contextual information induced by mandating logical schemata that deviate from those expected for a target task. Our evaluation reveals that LLMs consistently prioritize sensibility over compliance, favoring task-appropriate reasoning patterns despite conflicting instructions. Notably, task accuracy is not strictly determined by sensibility, with models often maintaining high performance even when using conflicting patterns, suggesting a reliance on internalized parametric memory that increases with model size. We further demonstrate that reasoning conflicts are internally detectable, as confidence scores significantly drop during conflicting episodes. Probing experiments confirm that reasoning types are linearly encoded from middle-to-late layers, indicating the potential for activation-level controllability. Leveraging these insights, we steer models towards compliance, increasing instruction following by up to 29%. Overall, our findings establish that while LLM reasoning is anchored to concrete instances, active mechanistic interventions can effectively decouple logical schemata from data, offering a path toward improved controllability, faithfulness, and generalizability.

0 Citations

0 Influential

7.5 Altmetric

37.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!