2602.05386v2 Feb 05, 2026 cs.CR

스파이더 센스: 계층적 적응형 검증을 통한 효율적인 에이전트 방어를 위한 고유한 위험 감지

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Shuo Zhang

Citations: 79

h-index: 5

Chaofa Yuan

Citations: 2

h-index: 1

Zhi Yang

Citations: 16

h-index: 2

Huacan Wang

Citations: 82

h-index: 5

Xin Guo

Citations: 150

h-index: 3

Liwen Zhang

Citations: 11

h-index: 2

Shuhe Wang

Citations: 17

h-index: 2

Jie Huang

Citations: 22

h-index: 4

Tu Hu

Citations: 13

h-index: 2

Fangqi Lou

Citations: 91

h-index: 3

Zhaowei Liu

Citations: 174

h-index: 5

Rongze Chen

Citations: 21

h-index: 2

Kunyi Wang

Citations: 13

h-index: 3

Zhenxiong Yu

Citations: 2

h-index: 1

Zhiheng Jin

Citations: 2

h-index: 1

Heng Zhang

Tianjin University

Citations: 56

h-index: 3

Yanlin Fei

Citations: 1

h-index: 1

Lingfeng Zeng

Citations: 83

h-index: 3

Xingyu Zhu

Citations: 2

h-index: 1

Feipeng Zhang

Citations: 29

h-index: 3

Rong-Chang Chen

Citations: 13

h-index: 1

Jingping Liu

Citations: 1,675

h-index: 12

대규모 언어 모델(LLM)이 자율 에이전트로 발전하면서 실제 적용 가능성이 크게 확대되었지만, 동시에 새로운 보안상의 과제가 발생했습니다. 기존 에이전트 방어 메커니즘은 대부분 보안 검증을 에이전트 생명 주기의 미리 정의된 단계에서 강제로 실행하는 의무적 검사 방식을 채택합니다. 본 연구에서는 효과적인 에이전트 보안은 구조적으로 분리되고 의무적인 방식이 아닌, 내재적이고 선택적인 방식으로 이루어져야 한다고 주장합니다. 우리는 고유한 위험 감지(Intrinsic Risk Sensing, IRS)를 기반으로 한 이벤트 기반 방어 프레임워크인 '스파이더 센스(Spider-Sense)'를 제안합니다. 스파이더 센스는 에이전트가 잠재적인 경계를 유지하고 위험을 감지했을 때만 방어를 작동하도록 합니다. 방어가 작동되면, 스파이더 센스는 효율성과 정확성 간의 균형을 맞추는 계층적 방어 메커니즘을 활성화합니다. 이는 경량의 유사성 매칭을 통해 알려진 패턴을 해결하고, 모호한 경우에는 심층적인 내부 추론을 수행하여 외부 모델에 대한 의존성을 줄입니다. 엄격한 평가를 위해, 실제 도구 실행 및 다단계 공격을 포함하는 생명 주기 인식 벤치마크인 S$^2$Bench를 소개합니다. 광범위한 실험 결과, 스파이더 센스는 경쟁력 있는 또는 우수한 방어 성능을 달성하며, 공격 성공률(Attack Success Rate, ASR)과 오탐율(False Positive Rate, FPR)이 가장 낮고, 지연 시간 오버헤드는 8.3%로 미미한 수준입니다.

Original Abstract

As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S$^2$Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.

1 Citations

1 Influential

6 Altmetric

33.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!