2602.04674v1 Feb 04, 2026 cs.SI

태도 과장, 네트워크 무시: 여론 조사를 시뮬레이션하는 LLM의 편향성 - 오정보 민감도 모델링의 문제점

Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility

Eun Cheol Choi

Citations: 84

h-index: 3

Lindsay E Young

Citations: 7

h-index: 2

Emilio Ferrara

Citations: 29

h-index: 3

대규모 언어 모델(LLM)은 사회 과학 분야에서 인간의 판단을 대체하는 도구로 점점 더 많이 사용되고 있지만, LLM이 오정보에 대한 민감성의 패턴을 얼마나 정확하게 재현할 수 있는지는 불분명합니다. 본 연구에서는 LLM을 사용하여 시뮬레이션된 설문 조사 응답자들이, 소셜 서베이 데이터에서 얻은 참여자 프로필(네트워크, 인구 통계, 태도 및 행동 특징 포함)을 기반으로, 인간의 오정보 신념 및 공유 패턴을 재현할 수 있는지 테스트합니다. 세 개의 온라인 설문 조사를 기준으로, LLM의 출력 결과가 관찰된 응답 분포와 일치하는지, 그리고 원본 설문 데이터에 존재하는 특징과 결과 간의 연관성을 복원하는지 평가합니다. LLM이 생성한 응답은 전반적인 분포 경향을 어느 정도 반영하며 인간 응답과 낮은 상관관계를 보이지만, 신념과 공유 간의 연관성을 지속적으로 과장하는 경향이 있습니다. 시뮬레이션된 응답에 적합한 선형 모델은 인간 응답에 적합한 모델보다 설명력(explained variance)이 훨씬 높으며, 태도 및 행동 특징에 과도하게 비중을 두는 반면, 개인 네트워크 특성은 상대적으로 무시하는 경향이 있습니다. 모델이 생성한 추론 분석 및 LLM 학습 데이터 분석 결과, 이러한 왜곡은 오정보 관련 개념이 표현되는 방식에서의 체계적인 편향을 반영하는 것으로 나타났습니다. 본 연구 결과는 LLM 기반 설문 시뮬레이션이 인간의 판단을 대체하는 것보다, 인간 판단과의 체계적인 차이를 진단하는 데 더 적합하다는 것을 시사합니다.

Original Abstract

Large language models (LLMs) are increasingly used as proxies for human judgment in computational social science, yet their ability to reproduce patterns of susceptibility to misinformation remains unclear. We test whether LLM-simulated survey respondents, prompted with participant profiles drawn from social survey data measuring network, demographic, attitudinal and behavioral features, can reproduce human patterns of misinformation belief and sharing. Using three online surveys as baselines, we evaluate whether LLM outputs match observed response distributions and recover feature-outcome associations present in the original survey data. LLM-generated responses capture broad distributional tendencies and show modest correlation with human responses, but consistently overstate the association between belief and sharing. Linear models fit to simulated responses exhibit substantially higher explained variance and place disproportionate weight on attitudinal and behavioral features, while largely ignoring personal network characteristics, relative to models fit to human responses. Analyses of model-generated reasoning and LLM training data suggest that these distortions reflect systematic biases in how misinformation-related concepts are represented. Our findings suggest that LLM-based survey simulations are better suited for diagnosing systematic divergences from human judgment than for substituting it.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!