2601.05050v2 Jan 08, 2026 cs.AI

대규모 언어 모델은 사람들이 음모론을 믿도록 효과적으로 설득할 수 있다

Large language models can effectively convince people to believe conspiracies

Thomas H. Costello

Citations: 14

h-index: 2

Kellin Pelrine

Citations: 605

h-index: 11

Matthew Kowal

Citations: 74

h-index: 3

A. Arechar

Citations: 2,320

h-index: 14

J. Godbout

Citations: 713

h-index: 15

Adam Gleave

Citations: 1,145

h-index: 16

David G. Rand

Citations: 1,010

h-index: 14

Gordon Pennycook

Citations: 520

h-index: 8

대규모 언어 모델(LLM)은 다양한 맥락에서 설득력이 있음이 입증되었다. 그러나 이러한 설득력이 거짓보다 진실에 유리하게 작용하는지, 아니면 LLM이 잘못된 믿음을 반박하는 것만큼이나 쉽게 이를 조장할 수 있는지는 불분명하다. 본 연구에서는 참가자(미국인 2,724명)가 확신을 갖지 못한 음모론에 대해 GPT-4o와 토론하는 세 가지 사전 등록된 실험을 통해 이 문제를 조사했다. 이 실험에서 모델은 해당 음모론에 반대("반박")하거나 찬성("옹호")하도록 지시받았다. 안전장치(guardrails)가 제거된 "탈옥(jailbroken)"된 변형 GPT-4o를 사용했을 때, AI는 음모론에 대한 믿음을 감소시키는 것만큼이나 증가시키는 데에도 효과적이었다. 우려스럽게도, 음모론을 옹호하는 AI는 반박하는 AI보다 더 긍정적인 평가를 받았으며, AI에 대한 신뢰도 더 높이는 결과를 보였다. 놀랍게도, 표준 GPT-4o를 사용했을 때도 매우 유사한 효과가 나타났는데, 이는 OpenAI가 적용한 안전장치가 LLM의 음모론 조장을 막는 데 별다른 효과가 없었음을 시사한다. 그러나 고무적인 점은, 교정적 대화를 통해 새롭게 형성된 음모론 믿음을 되돌릴 수 있었으며, 단순히 GPT-4o에게 정확한 정보만 사용하도록 프롬프트를 입력하는 것만으로도 음모론 믿음을 증가시키는 능력이 급격히 감소했다는 것이다. 우리의 연구 결과는 LLM이 진실과 거짓 모두를 조장할 수 있는 강력한 능력을 지니고 있음을 보여주지만, 동시에 이러한 위험을 완화할 수 있는 잠재적인 해결책도 존재할 수 있음을 시사한다.

Original Abstract

Large language models (LLMs) have been shown to be persuasive across a variety of contexts. But it remains unclear whether this persuasive power advantages truth over falsehood, or if LLMs can promote misbeliefs just as easily as refuting them. Here, we investigate this question across three pre-registered experiments in which participants (N = 2,724 Americans) discussed a conspiracy theory they were uncertain about with GPT-4o, and the model was instructed to either argue against ("debunking") or for ("bunking") that conspiracy. When using a "jailbroken" GPT-4o variant with guardrails removed, the AI was as effective at increasing conspiracy belief as decreasing it. Concerningly, the bunking AI was rated more positively, and increased trust in AI, more than the debunking AI. Surprisingly, we found that using standard GPT-4o produced very similar effects, such that the guardrails imposed by OpenAI did little to prevent the LLM from promoting conspiracy beliefs. Encouragingly, however, a corrective conversation reversed these newly induced conspiracy beliefs, and simply prompting GPT-4o to only use accurate information dramatically reduced its ability to increase conspiracy beliefs. Our findings demonstrate that LLMs possess potent abilities to promote both truth and falsehood, but that potential solutions may exist to help mitigate this risk.

2 Citations

0 Influential

8 Altmetric

42.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!