2601.19245v1 Jan 27, 2026 cs.AI

도메인 내부 탐지를 넘어: 교차 도메인 환각 탐지를 위한 SpikeScore

Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection

Yongxin Deng

Citations: 57

h-index: 5

Zhen Fang

Citations: 20

h-index: 2

Ling Chen

Citations: 9

h-index: 1

Yixuan Li

Citations: 176

h-index: 7

환각(Hallucination) 탐지는 대규모 언어 모델(LLM)을 실제 환경에 배포하는 데 있어 매우 중요하다. 기존의 환각 탐지 방법들은 훈련 데이터와 테스트 데이터가 동일한 도메인에 속할 때는 강력한 성능을 보이지만, 교차 도메인(cross-domain) 일반화 성능은 떨어진다는 단점이 있다. 본 논문에서는 단일 도메인 데이터로 환각 탐지기를 훈련하면서도 다양한 관련 도메인에서 견고한 성능을 보장하는 것을 목표로 하는, 중요하지만 그동안 간과되었던 문제인 '일반화 가능한 환각 탐지(GHD)'에 대해 연구한다. GHD를 연구하며 우리는 LLM의 초기 응답에 이어지는 멀티턴 대화를 시뮬레이션하였고 흥미로운 현상을 관찰했다. 이는 환각으로 시작된 멀티턴 대화가 사실에 기반한 대화에 비해 다양한 도메인 전반에서 보편적으로 더 큰 불확실성 변동을 보인다는 점이다. 이러한 현상에 기반하여, 우리는 멀티턴 대화 내의 급격한 변동을 정량화하는 새로운 지표인 SpikeScore를 제안한다. 이론적 분석과 실증적 검증을 통해, 우리는 SpikeScore가 환각된 응답과 그렇지 않은 응답 사이에서 강력한 교차 도메인 분별력을 달성함을 입증한다. 다수의 LLM과 벤치마크를 대상으로 한 실험에서 SpikeScore 기반 탐지 방법은 교차 도메인 일반화 측면에서 대표적인 기준 모델들을 능가하였으며, 고도화된 일반화 지향 방법들보다도 우수한 성능을 보여 교차 도메인 환각 탐지에서의 유효성을 확인하였다.

Original Abstract

Hallucination detection is critical for deploying large language models (LLMs) in real-world applications. Existing hallucination detection methods achieve strong performance when the training and test data come from the same domain, but they suffer from poor cross-domain generalization. In this paper, we study an important yet overlooked problem, termed generalizable hallucination detection (GHD), which aims to train hallucination detectors on data from a single domain while ensuring robust performance across diverse related domains. In studying GHD, we simulate multi-turn dialogues following LLMs initial response and observe an interesting phenomenon: hallucination-initiated multi-turn dialogues universally exhibit larger uncertainty fluctuations than factual ones across different domains. Based on the phenomenon, we propose a new score SpikeScore, which quantifies abrupt fluctuations in multi-turn dialogues. Through both theoretical analysis and empirical validation, we demonstrate that SpikeScore achieves strong cross-domain separability between hallucinated and non-hallucinated responses. Experiments across multiple LLMs and benchmarks demonstrate that the SpikeScore-based detection method outperforms representative baselines in cross-domain generalization and surpasses advanced generalization-oriented methods, verifying the effectiveness of our method in cross-domain hallucination detection.

2 Citations

0 Influential

3.5 Altmetric

19.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!