2604.13630v1 Apr 15, 2026 cs.CR

SafeHarness: LLM 기반 에이전트 배포를 위한 통합 라이프사이클 보안 아키텍처

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Yancheng Chen

Citations: 4

h-index: 1

Yucheng Ning

Citations: 39

h-index: 1

Nan Sun

Citations: 54

h-index: 2

Bin Chong

Citations: 3

h-index: 1

Li Guo

Citations: 36

h-index: 1

Chuan Zhou

Citations: 113

h-index: 5

Xixun Lin

Citations: 7

h-index: 2

Shunhong Zhang

Citations: 1

h-index: 1

Yanan Cao

Citations: 221

h-index: 8

Yongxuan Wu

Citations: 92

h-index: 4

Yang Liu

Citations: 6

h-index: 2

Yilong Liu

Citations: 1

h-index: 1

대규모 언어 모델(LLM) 에이전트의 성능은 도구 사용, 컨텍스트 관리 및 상태 유지 기능을 조율하는 시스템 계층인 실행 환경(harness)에 크게 의존합니다. 그러나 이러한 핵심적인 아키텍처적 중요성 때문에 실행 환경은 공격에 취약한 영역입니다. 실행 환경에서 발생하는 단일적인 취약점은 전체 실행 파이프라인으로 확산될 수 있습니다. 기존의 보안 접근 방식은 구조적 불일치로 인해 실행 환경 내부의 상태를 파악하지 못하고, 에이전트 운영의 다양한 단계에 걸쳐 협력적인 보안을 제공하지 못하는 문제가 있습니다. 본 논문에서는 위와 같은 중요한 제한 사항을 해결하기 위해, 제안하는 네 가지 보안 계층을 에이전트 라이프사이클에 직접 통합한 보안 아키텍처인 SafeHarness를 소개합니다. 이러한 계층은 입력 처리 단계에서 적대적인 컨텍스트 필터링, 의사 결정 단계에서 계층화된 인과 관계 검증, 액션 실행 단계에서 권한 분리된 도구 제어, 그리고 상태 업데이트 단계에서 안전한 롤백 및 적응적 성능 저하 기능을 제공합니다. 제안하는 교차 계층 메커니즘은 이러한 계층들을 연결하여 지속적인 이상 징후가 감지될 때마다 검증 수준을 높이고, 롤백을 트리거하며, 도구 권한을 강화합니다. 우리는 다양한 실행 환경 구성에서 벤치마크 데이터 세트를 사용하여 SafeHarness를 평가하고, 다섯 가지 공격 시나리오와 여섯 가지 위협 범주를 포괄하는 네 가지 보안 기준과 비교했습니다. 보호되지 않은 기준과 비교했을 때, SafeHarness는 평균적으로 UBR(Unsafe Behavior Rate, 안전하지 않은 행동 비율)을 약 38% 감소시키고, ASR(Attack Success Rate, 공격 성공 비율)을 약 42% 감소시켜 안전하지 않은 행동 비율과 공격 성공 비율을 크게 낮추면서도 핵심적인 작업 유용성을 유지합니다.

Original Abstract

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security approaches suffer from structural mismatch, leaving them blind to harness-internal state and unable to coordinate across the different phases of agent operation. In this paper, we introduce \safeharness{}, a security architecture in which four proposed defense layers are woven directly into the agent lifecycle to address above significant limitations: adversarial context filtering at input processing, tiered causal verification at decision making, privilege-separated tool control at action execution, and safe rollback with adaptive degradation at state update. The proposed cross-layer mechanisms tie these layers together, escalating verification rigor, triggering rollbacks, and tightening tool privileges whenever sustained anomalies are detected. We evaluate \safeharness{} on benchmark datasets across diverse harness configurations, comparing against four security baselines under five attack scenarios spanning six threat categories. Compared to the unprotected baseline, \safeharness{} achieves an average reduction of approximately 38\% in UBR and 42\% in ASR, substantially lowering both the unsafe behavior rate and the attack success rate while preserving core task utility.

1 Citations

0 Influential

4 Altmetric

21.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!