2601.05905v1 Jan 09, 2026 cs.CL

자신감의 환상인가? 이웃 일관성을 통한 LLM의 진실성 진단

Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

Jeff Z. Pan

Citations: 12

h-index: 2

Yunzhi Yao

Zhejiang University;Shandong University

Citations: 3,041

h-index: 21

Shumin Deng

Citations: 5,854

h-index: 39

Huajun Chen

Citations: 4,339

h-index: 32

Haoming Xu

Citations: 149

h-index: 4

Ningyuan Zhao

Citations: 73

h-index: 3

Weihong Xu

Citations: 70

h-index: 2

Xinle Deng

Citations: 93

h-index: 5

Ningyu Zhang

Citations: 178

h-index: 7

Hongru Wang

The Chinese University of Hong Kong, University of Edinburgh

Citations: 2,219

h-index: 24

대규모 언어 모델(LLM)이 실제 환경에 점점 더 많이 적용됨에 따라, 정확성만으로는 충분하지 않습니다. 신뢰할 수 있는 배포를 위해서는 문맥 변화에도 불구하고 진실한 믿음을 유지해야 합니다. 기존의 평가는 Self-Consistency와 같이 점별적인 신뢰도에 크게 의존하는데, 이는 불안정한 믿음을 가릴 수 있습니다. 우리는 완벽한 Self-Consistency로 답변된 사실조차도 경미한 문맥 간섭 하에서 빠르게 붕괴될 수 있음을 보여줍니다. 이러한 격차를 해소하기 위해, 우리는 개념적 이웃 관계를 통해 응답의 일관성을 평가하는 구조적 믿음 강건성 지표인 Neighbor-Consistency Belief (NCB)를 제안합니다. NCB의 효율성을 검증하기 위해, 우리는 문맥 간섭 하에서 출력의 안정성을 테스트하는 새로운 인지 스트레스 테스트 프로토콜을 도입했습니다. 여러 LLM에 대한 실험 결과, 높은 NCB 값을 가진 데이터는 간섭에 상대적으로 더 강건한 것으로 나타났습니다. 마지막으로, 우리는 문맥에 영향을 받지 않는 믿음 구조를 최적화하고, 긴 꼬리 지식의 취약성을 약 30% 줄이는 Structure-Aware Training (SAT)을 제시합니다. 코드는 https://github.com/zjunlp/belief 에서 제공됩니다.

Original Abstract

As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%. Code will be available at https://github.com/zjunlp/belief.

0 Citations

0 Influential

49.229550745277 Altmetric

246.1 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!