2604.05593v1 Apr 07, 2026 cs.AI

레이블 효과: 인간과 LLM-as-a-Judge 시스템의 신뢰도 평가에서 공유되는 휴리스틱 의존성

Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

Xing Sun

Citations: 5

h-index: 1

Si Qin

Citations: 9

h-index: 2

Isao Echizen

Citations: 10

h-index: 2

A. Ali

Citations: 59

h-index: 3

Saku Sugawara

Citations: 5

h-index: 1

Di Wu

Citations: 17

h-index: 2

대규모 언어 모델(LLM)은 점점 더 자동 평가 시스템(LLM-as-a-Judge)으로 활용되고 있습니다. 본 연구는 LLM-as-a-Judge 시스템의 신뢰성에 의문을 제기하며, LLM이 정보의 신뢰도를 판단할 때 공개된 출처 정보(레이블)에 의해 편향될 수 있음을 보여줍니다. 반사실적 설계를 사용하여, 인간과 LLM 평가 시스템 모두가 인간이 작성한 것으로 레이블된 정보에 대해 AI가 생성한 것으로 레이블된 동일한 콘텐츠보다 더 높은 신뢰도를 부여하는 것을 확인했습니다. 시선 추적 데이터를 통해 인간이 판단을 내릴 때 출처 레이블을 중요한 판단 기준(휴리스틱)으로 활용한다는 것을 알 수 있습니다. LLM 평가 시스템의 판단 과정에서의 내부 상태를 분석한 결과, 모든 조건에서 모델은 콘텐츠 영역보다 레이블 영역에 더 많은 주의를 기울이는 경향을 보였으며, 특히 '인간 작성' 레이블의 경우 'AI 생성' 레이블보다 이러한 경향이 더 강했습니다. 이는 인간의 시선 패턴과 일치합니다. 또한, 'AI 생성' 레이블 조건에서 '인간 작성' 레이블 조건보다 판단의 불확실성(logits 측정)이 더 높게 나타났습니다. 이러한 결과는 출처 레이블이 인간과 LLM 모두에게 중요한 판단 기준이 될 수 있음을 시사합니다. 이는 레이블에 민감한 LLM-as-a-Judge 평가 시스템의 타당성에 대한 우려를 제기하며, 모델을 인간의 선호도에 맞추는 과정에서 인간의 휴리스틱 의존성이 모델에 전이될 수 있다는 점을 경고합니다. 따라서, 편향되지 않은 평가 및 정렬 방법을 모색해야 합니다.

Original Abstract

Large language models (LLMs) are increasingly used as automated evaluators (LLM-as-a-Judge). This work challenges its reliability by showing that trust judgments by LLMs are biased by disclosed source labels. Using a counterfactual design, we find that both humans and LLM judges assign higher trust to information labeled as human-authored than to the same content labeled as AI-generated. Eye-tracking data reveal that humans rely heavily on source labels as heuristic cues for judgments. We analyze LLM internal states during judgment. Across label conditions, models allocate denser attention to the label region than the content region, and this label dominance is stronger under Human labels than AI labels, consistent with the human gaze patterns. Besides, decision uncertainty measured by logits is higher under AI labels than Human labels. These results indicate that the source label is a salient heuristic cue for both humans and LLMs. It raises validity concerns for label-sensitive LLM-as-a-Judge evaluation, and we cautiously raise that aligning models with human preferences may propagate human heuristic reliance into models, motivating debiased evaluation and alignment.

1 Citations

0 Influential

1.5 Altmetric

8.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!