2602.14529v1 Feb 16, 2026 cs.AI

LLM에서의 기만과 환각 실패의 구분

Disentangling Deception and Hallucination Failures in LLMs

Haolang Lu

Citations: 198

h-index: 4

Guoshun Nan

Citations: 75

h-index: 3

Kun Wang

Citations: 102

h-index: 4

Hongcan Guo

Citations: 110

h-index: 5

Hongrui Peng

Citations: 16

h-index: 2

Weiye Fu

Citations: 5

h-index: 2

Xinye Cao

Citations: 192

h-index: 4

Xingrui Li

Citations: 42

h-index: 5

대형 언어 모델(LLM)의 실패는 종종 행동적 관점에서 분석되며, 이 경우 사실적 질의응답에서의 부정확한 출력은 흔히 지식의 부재와 연관됩니다. 본 연구에서는 개체 기반의 사실적 질의에 초점을 맞추어, 이러한 관점이 서로 다른 실패 메커니즘을 혼동할 수 있음을 지적하고 '지식의 존재'와 '행동적 표현'을 분리하는 내부적, 메커니즘 지향적 관점을 제안합니다. 이 공식화에 따르면 환각과 기만은 출력 수준에서는 유사해 보일 수 있으나 기저 메커니즘에서는 차이가 있는 질적으로 다른 두 가지 실패 양상에 해당합니다. 이러한 차이를 연구하기 위해, 우리는 지식은 보존되면서 행동적 표현은 선택적으로 변경되는 개체 중심의 사실적 질문에 대한 통제된 환경을 구축하여 네 가지 행동 사례에 대한 체계적인 분석을 가능하게 했습니다. 우리는 표현 분리 가능성, 희소 해석 가능성, 그리고 추론 시점 활성화 제어를 통해 이러한 실패 양상을 분석합니다.

Original Abstract

Failures in large language models (LLMs) are often analyzed from a behavioral perspective, where incorrect outputs in factual question answering are commonly associated with missing knowledge. In this work, focusing on entity-based factual queries, we suggest that such a view may conflate different failure mechanisms, and propose an internal, mechanism-oriented perspective that separates Knowledge Existence from Behavior Expression. Under this formulation, hallucination and deception correspond to two qualitatively different failure modes that may appear similar at the output level but differ in their underlying mechanisms. To study this distinction, we construct a controlled environment for entity-centric factual questions in which knowledge is preserved while behavioral expression is selectively altered, enabling systematic analysis of four behavioral cases. We analyze these failure modes through representation separability, sparse interpretability, and inference-time activation steering.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!