2602.20021v1 Feb 23, 2026 cs.AI

혼돈의 에이전트

Agents of Chaos

Reuth Mirsky

Citations: 654

h-index: 16

Natalie Shapira

Citations: 372

h-index: 8

C. Wendler

Citations: 154

h-index: 7

Avery Yen

Citations: 10

h-index: 1

Gabriele Sarti

Northeastern University

Citations: 790

h-index: 12

Koyena Pal

Citations: 289

h-index: 6

Olivia Floody

Citations: 10

h-index: 1

Adam Belfki

Citations: 92

h-index: 4

Alexander R. Loftus

Citations: 73

h-index: 3

Aditya Ratan Jannali

Citations: 10

h-index: 1

Nikhil Prakash

Northeastern University

Citations: 320

h-index: 8

Giordano Rogers

Citations: 16

h-index: 2

Jannik Brinkmann

Citations: 298

h-index: 8

C. Rager

Citations: 704

h-index: 9

Amir Zur

Citations: 210

h-index: 5

M. Ripa

Citations: 71

h-index: 3

Aruna Sankaranarayanan

Citations: 11

h-index: 1

David Atkinson

Citations: 66

h-index: 5

Rohit Gandikota

Citations: 48

h-index: 2

Jaden Fiotto-Kaufman

Citations: 628

h-index: 5

EunJeong Hwang

Citations: 260

h-index: 7

Hadas Orgad

Citations: 1,031

h-index: 10

P SAM SAHIL

Citations: 13

h-index: 2

Negev Taglicht

Citations: 10

h-index: 1

Tomer Shabtay

Citations: 10

h-index: 1

Atai Ambus

Citations: 10

h-index: 1

Nitay Alon

Citations: 110

h-index: 7

Shiri Oron

Citations: 10

h-index: 1

Ayelet Gordon-Tapiero

Citations: 24

h-index: 2

Yotam Kaplan

Citations: 46

h-index: 4

V. Shwartz

Citations: 385

h-index: 11

Tamar Rott Shaham

Citations: 1,485

h-index: 12

Christoph Riedl

Citations: 40

h-index: 3

M. Sap

Citations: 275

h-index: 7

David Manheim

Citations: 39

h-index: 2

Tomer D. Ullman

Citations: 136

h-index: 7

David Bau

Citations: 607

h-index: 9

Jasmine Cui

Citations: 10

h-index: 1

우리는 영구 메모리, 이메일 계정, Discord 접근 권한, 파일 시스템 및 셸 실행 환경을 갖춘 실제 실험실 환경에 배포된 자율 언어 모델 기반 에이전트들에 대한 탐색적 레드티밍(red-teaming) 연구 결과를 보고한다. 2주 동안 20명의 AI 연구원들이 정상적 및 적대적 조건에서 에이전트들과 상호작용했다. 우리는 언어 모델과 자율성, 도구 사용, 다자간 통신의 통합으로 인해 발생하는 실패에 초점을 맞춰 11개의 대표적인 사례 연구를 기록했다. 관찰된 행동에는 비소유자의 승인되지 않은 명령 수행, 민감 정보 유출, 파괴적인 시스템 수준 작업 실행, 서비스 거부(DoS) 상태 유발, 통제되지 않은 자원 소비, 신원 위장 취약점, 에이전트 간 안전하지 않은 관행 전파, 부분적인 시스템 탈취 등이 포함된다. 여러 사례에서 에이전트는 작업을 완료했다고 보고했으나 실제 기본 시스템 상태는 그 보고와 모순되었다. 우리는 또한 실패한 시도 중 일부에 대해서도 보고한다. 우리의 연구 결과는 현실적인 배포 환경에서 보안, 개인정보 보호 및 거버넌스 관련 취약점이 존재함을 입증한다. 이러한 행동은 책무성, 권한 위임, 파생적 피해에 대한 책임과 관련하여 미해결 질문을 제기하며, 법학자, 정책 입안자 및 다양한 분야의 연구자들의 시급한 주의를 요구한다. 이 보고서는 이러한 폭넓은 논의를 위한 초기의 경험적 기여로서 기능한다.

Original Abstract

We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines. This report serves as an initial empirical contribution to that broader conversation.

11 Citations

1 Influential

8 Altmetric

53.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!