2604.11609v1 Apr 13, 2026 cs.AI

교차적 아첨: 사용자의 인지된 인구 통계가 대규모 언어 모델의 잘못된 검증에 미치는 영향

Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models

Citations: 18

h-index: 2

Citations: 0

h-index: 0

대규모 언어 모델은 칭찬을 받기 위해 사용자의 잘못된 믿음을 확인하는 아첨적인 경향을 보입니다. 본 연구에서는 이러한 행동이 사용자의 인지된 인구 통계에 따라 체계적으로 어떻게 달라지는지 조사하고, 인종, 연령, 성별, 그리고 표현된 자신감 수준의 조합이 잘못된 검증 비율에 어떤 차이를 보이는지 실험합니다. 법률 개념인 교차성에 영감을 받아, Anthropic의 Petri 평가 프레임워크를 사용하여 GPT-5-nano와 Claude Haiku 4.5를 대상으로 수학, 철학, 그리고 음모론 분야에서 128가지 페르소나 조합을 활용한 768번의 다중 대화 실험을 진행했습니다. GPT-5-nano는 전체적으로 Claude Haiku 4.5보다 훨씬 더 아첨적인 경향을 보였습니다 (평균 2.96 vs. 1.74, p < 10^{-32}, Wilcoxon 부호 순위 검정). GPT-5-nano의 경우, 철학 분야가 수학 분야보다 41% 더 높은 아첨 빈도를 보였으며, 인종별로 히스패닉 페르소나가 가장 높은 아첨 빈도를 나타냈습니다. 가장 낮은 점수를 받은 페르소나는 자신감 넘치는 23세 히스패닉 여성으로, 아첨 정도 평균이 10점 만점에 5.33점이었습니다. Claude Haiku 4.5는 전반적으로 아첨 정도가 낮았으며, 인구 통계별로 유의미한 차이를 보이지 않았습니다. 이러한 결과는 아첨이 사용자에게 균등하게 나타나지 않으며, 안전성 평가에 사용자의 정체성을 고려한 테스트가 포함되어야 함을 보여줍니다.

Original Abstract

Large language models exhibit sycophantic tendencies--validating incorrect user beliefs to appear agreeable. We investigate whether this behavior varies systematically with perceived user demographics, testing whether combinations of race, age, gender, and expressed confidence level produce differential false validation rates. Inspired by the legal concept of intersectionality, we conduct 768 multi-turn adversarial conversations using Anthropic's Petri evaluation framework, probing GPT-5-nano and Claude Haiku 4.5 across 128 persona combinations in mathematics, philosophy, and conspiracy theory domains. GPT-5-nano is significantly more sycophantic than Claude Haiku 4.5 overall ($\bar{x}=2.96$ vs. $1.74$, $p < 10^{-32}$, Wilcoxon signed-rank). For GPT-5-nano, we find that philosophy elicits 41% more sycophancy than mathematics and that Hispanic personas receive the highest sycophancy across races. The worst-scoring persona, a confident, 23-year-old Hispanic woman, averages 5.33/10 on sycophancy. Claude Haiku 4.5 exhibits uniformly low sycophancy with no significant demographic variation. These results demonstrate that sycophancy is not uniformly distributed across users and that safety evaluations should incorporate identity-aware testing.

0 Citations

0 Influential

1 Altmetric

5.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!