2601.18753v2 Jan 26, 2026 cs.LG

HalluGuard: LLM에서 발생하는 데이터 기반 및 추론 기반 환각 현상에 대한 이해

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

Junhong Lin

Citations: 178

h-index: 5

Xinyue Zeng

Citations: 56

h-index: 3

Yujun Yan

Citations: 7

h-index: 2

Feng Guo

Citations: 38

h-index: 3

Liang Shi

Citations: 168

h-index: 4

Jun Wu

Citations: 83

h-index: 3

Dawei Zhou

Citations: 57

h-index: 4

대규모 언어 모델(LLM)은 의료, 법률, 과학 연구 등 고위험 분야에서 신뢰성이 중요한 요소이지만, 종종 환각 현상으로 인해 성능이 저하됩니다. 이러한 문제는 주로 데이터 기반 환각과 추론 기반 환각이라는 두 가지 원인에서 비롯됩니다. 그러나 기존의 환각 탐지 방법은 일반적으로 한 가지 원인만을 다루거나 작업별 휴리스틱에 의존하여 복잡한 시나리오에 대한 일반화 능력이 제한됩니다. 이러한 한계를 극복하기 위해, 본 연구에서는 환각 위험 경계(Hallucination Risk Bound)라는 통일된 이론적 프레임워크를 제시합니다. 이 프레임워크는 환각 위험을 데이터 기반 구성 요소와 추론 기반 구성 요소로 공식적으로 분해하며, 각각은 학습 시간의 불일치와 추론 시간의 불안정성과 관련됩니다. 이를 통해 환각이 어떻게 발생하고 진화하는지에 대한 체계적인 분석의 기반을 제공합니다. 이러한 기반을 바탕으로, 우리는 NTK(Neural Tangent Kernel) 기반 점수인 HalluGuard를 제안합니다. HalluGuard는 NTK가 유도하는 기하학적 구조와 표현을 활용하여 데이터 기반 및 추론 기반 환각을 동시에 식별합니다. 우리는 10개의 다양한 벤치마크, 11개의 경쟁 모델, 9개의 인기 있는 LLM 백본을 사용하여 HalluGuard를 평가했으며, 다양한 형태의 LLM 환각을 탐지하는 데 있어 일관되게 최첨단 성능을 달성했습니다. 우리는 제안하는 모델을 https://github.com/Susan571/HalluGuard-ICLR2026 에서 오픈 소스로 제공합니다.

Original Abstract

The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the induced geometry and captured representations of the NTK to jointly identify data-driven and reasoning-driven hallucinations. We evaluate HalluGuard on 10 diverse benchmarks, 11 competitive baselines, and 9 popular LLM backbones, consistently achieving state-of-the-art performance in detecting diverse forms of LLM hallucinations. We open-source our proposed \model{} model at https://github.com/Susan571/HalluGuard-ICLR2026.

6 Citations

0 Influential

27.993061443341 Altmetric

146.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!