2604.04759v1 Apr 06, 2026 cs.CR

당신의 에이전트, 그들의 자산: OpenClaw의 실제 환경 안전 분석

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Zeyu Zheng

Citations: 78

h-index: 4

C. Xie

Citations: 59

h-index: 4

Huaxiu Yao

Citations: 106

h-index: 5

Letian Zhang

Citations: 418

h-index: 11

Tianyu Pang

Citations: 429

h-index: 11

Yuyin Zhou

Citations: 683

h-index: 12

Michael Shieh

Citations: 306

h-index: 9

Hardy Chen

Citations: 52

h-index: 4

Haoqin Tu

Citations: 896

h-index: 12

Zijun Wang

Citations: 218

h-index: 6

Jun Wu

Citations: 19

h-index: 2

Zhenlong Yuan

Citations: 799

h-index: 9

Xiangyan Liu

Citations: 117

h-index: 5

Fengze Liu

Citations: 10

h-index: 2

2026년 초 가장 널리 사용되는 개인 AI 에이전트인 OpenClaw는 시스템에 대한 완전한 로컬 접근 권한을 가지고 있으며, Gmail, Stripe, 파일 시스템 등과 같은 민감한 서비스와 통합되어 운영됩니다. 이러한 광범위한 권한은 높은 수준의 자동화 및 강력한 개인화를 가능하게 하지만, 동시에 기존의 격리된 평가 방식으로는 파악하기 어려운 상당한 공격 표면을 노출합니다. 이러한 격차를 해소하기 위해, 우리는 OpenClaw에 대한 최초의 실제 환경 안전 평가를 제시하고, 에이전트의 지속적인 상태를 안전 분석을 위해 Capability(능력), Identity(정체성), Knowledge(지식)의 세 가지 차원으로 통합하는 CIK 분류 체계를 소개합니다. 우리의 평가는 Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, 및 GPT-5.4 모델을 사용하는 실제 OpenClaw 환경에서 12가지 공격 시나리오를 대상으로 진행되었습니다. 그 결과, CIK의 단일 차원을 공격하는 경우 평균 공격 성공률이 24.6%에서 64-74%로 증가했으며, 가장 강력한 모델조차도 기본 취약성보다 세 배 이상 증가하는 것으로 나타났습니다. 또한, 우리는 세 가지 CIK 기반 방어 전략과 함께 파일 보호 메커니즘을 평가했습니다. 그러나 가장 강력한 방어 전략조차도 Capability를 대상으로 하는 공격에서 63.8%의 성공률을 보였으며, 파일 보호는 악성 삽입을 97% 차단하지만, 합법적인 업데이트도 막는다는 단점이 있습니다. 종합적으로 볼 때, 이러한 결과는 취약점이 에이전트 아키텍처 자체에 내재되어 있으며, 개인 AI 에이전트를 안전하게 보호하기 위해서는 더욱 체계적인 보안 조치가 필요하다는 것을 보여줍니다. 프로젝트 페이지는 https://ucsc-vlaa.github.io/CIK-Bench 입니다.

Original Abstract

OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to 64-74%, with even the most robust model exhibiting more than a threefold increase over its baseline vulnerability. We further assess three CIK-aligned defense strategies alongside a file-protection mechanism; however, the strongest defense still yields a 63.8% success rate under Capability-targeted attacks, while file protection blocks 97% of malicious injections but also prevents legitimate updates. Taken together, these findings show that the vulnerabilities are inherent to the agent architecture, necessitating more systematic safeguards to secure personal AI agents. Our project page is https://ucsc-vlaa.github.io/CIK-Bench.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!