2601.06666v1 Jan 10, 2026 cs.CL

InFi-Check: LLM의 해석 가능하고 세분화된 사실 검증

InFi-Check: Interpretable and Fine-Grained Fact-Checking of LLMs

Yuzhuo Bai

Citations: 1,899

h-index: 8

Shuzheng Si

Citations: 32

h-index: 3

Kangyang Luo

Citations: 73

h-index: 5

Qingyi Wang

Citations: 24

h-index: 3

Wenhao Li

Citations: 61

h-index: 4

Gang Chen

Citations: 91

h-index: 6

Fanchao Qi

Tsinghua University

Citations: 3,599

h-index: 24

Maosong Sun

Citations: 94

h-index: 5

대규모 언어 모델(LLM)은 종종 환각 현상을 일으키지만, 대부분의 기존 사실 검증 방법은 사실성 평가를 이진 분류 문제로 취급하여 해석 가능성이 제한적이며, 세분화된 오류 유형을 파악하지 못합니다. 본 논문에서는 LLM 출력의 해석 가능하고 세분화된 사실 검증을 위한 프레임워크인 InFi-Check를 소개합니다. 구체적으로, 우리는 먼저 명시적인 증거, 세분화된 오류 유형 레이블, 근거 및 수정 사항을 포함하는 고품질 데이터를 생성하는 제어된 데이터 합성 파이프라인을 제안합니다. 이를 바탕으로, 우리는 대규모 학습 데이터를 구축하고, LLM 출력의 세분화된 사실 검증을 위한 수동 검증 벤치마크 InFi-Check-FG를 구성했습니다. 이러한 고품질 학습 데이터를 기반으로, 우리는 InFi-Checker를 제안하며, 이는 지원 증거를 함께 제공하고, 세분화된 오류 유형을 분류하며, 수정 사항과 함께 근거를 생성할 수 있습니다. 실험 결과, InFi-Checker는 InFi-Check-FG에서 최첨단 성능을 달성했으며, 다양한 하위 작업에서 강력한 일반화 성능을 보였습니다. 이는 사실성 평가의 유용성과 신뢰성을 크게 향상시킵니다.

Original Abstract

Large language models (LLMs) often hallucinate, yet most existing fact-checking methods treat factuality evaluation as a binary classification problem, offering limited interpretability and failing to capture fine-grained error types. In this paper, we introduce InFi-Check, a framework for interpretable and fine-grained fact-checking of LLM outputs. Specifically, we first propose a controlled data synthesis pipeline that generates high-quality data featuring explicit evidence, fine-grained error type labels, justifications, and corrections. Based on this, we further construct large-scale training data and a manually verified benchmark InFi-Check-FG for fine-grained fact-checking of LLM outputs. Building on these high-quality training data, we further propose InFi-Checker, which can jointly provide supporting evidence, classify fine-grained error types, and produce justifications along with corrections. Experiments show that InFi-Checker achieves state-of-the-art performance on InFi-Check-FG and strong generalization across various downstream tasks, significantly improving the utility and trustworthiness of factuality evaluation.

0 Citations

0 Influential

12 Altmetric

60.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!