2604.16909v1 Apr 18, 2026 cs.CL

PRISM: LLM 환각 현상에서 추론, 지시, 그리고 소스 기억을 탐색하는 연구

PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

Wei Wang

Citations: 0

h-index: 0

Guang Zhang

Citations: 17

h-index: 3

Yujie Chen

Citations: 4

h-index: 1

Guangyu Wang

Citations: 15

h-index: 2

Yuran Chen

Citations: 207

h-index: 7

Jiatong Zhang

Citations: 25

h-index: 3

Yutong Zhang

Citations: 748

h-index: 9

Jiaming Shang

Citations: 0

h-index: 0

Zhuang Liu

Citations: 3

h-index: 1

대규모 언어 모델(LLM)이 단순한 대화형 어시스턴트에서 복잡한 작업을 수행할 수 있는 에이전트로 진화함에 따라, 이들은 점점 더 위험한 영역에서 활용되고 있습니다. 그러나 기존의 벤치마크는 대부분 다양한 유형의 질문을 사용하고, 생성된 결과에 대한 사후 평가 및 점수를 통해 환각의 심각도를 측정하지만, 환각이 생성 과정에서 어디에서, 그리고 왜 발생하는지에 대한 제한적인 통찰력을 제공합니다. 따라서 우리는 환각 평가를 진단 문제로 재구성하고, 환각을 지식 부족, 지식 오류, 추론 오류, 그리고 지시 준수 오류의 네 가지 차원으로 분리하는 PRISM이라는 제어된 벤치마크를 제안합니다. PRISM은 65개의 작업에 걸쳐 9,448개의 예시를 포함하며, 생성 과정의 각 단계(기억, 지시, 추론)를 고려한 세분화된 진단 평가를 지원합니다. 24개의 주요 오픈 소스 및 독점 LLM을 평가한 결과, 지시 준수, 기억 검색, 그리고 논리적 추론 간의 일관된 상충 관계가 나타났으며, 완화 전략은 종종 다른 측면의 성능을 저해하는 경향이 있음을 확인했습니다. 우리는 PRISM이 LLM 환각의 구체적인 메커니즘을 이해하는 데 도움이 되는 프레임워크를 제공하고, 궁극적으로 신뢰할 수 있는 대규모 언어 모델의 개발을 가속화하는 데 기여할 수 있기를 바랍니다.

Original Abstract

As large language models (LLMs) evolve from conversational assistants into agents capable of handling complex tasks, they are increasingly deployed in high-risk domains. However, existing benchmarks largely rely on mixed queries and posterior evaluation, output-level scoring, which quantifies hallucination severity but offers limited insight into where and why hallucinations arise in the generation pipeline. We therefore reformulate hallucination evaluation as a diagnostic problem and propose PRISM, a controlled benchmark that disentangles hallucinations into four dimensions: knowledge missing, knowledge errors, reasoning errors, and instruction-following errors, grounded in three stages of generation (memory, instruction, and reasoning). PRISM contains 9,448 instances across 65 tasks and supports fine-grained, stage-aware diagnostic evaluation. Evaluating 24 mainstream open-source and proprietary LLMs, we uncover consistent trade-offs across instruction following, memory retrieval, and logical reasoning, showing that mitigation strategies often improve specific dimensions at the expense of others. We hope PRISM provides a framework for understanding the specific mechanisms behind LLMs hallucinations, ultimately accelerating the development of trustworthy large language models.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!