2602.07398v1 Feb 07, 2026 cs.CR

AgentSys: 명시적인 계층적 메모리 관리를 통한 안전하고 동적인 LLM 에이전트

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

Ruoyao Wen

Citations: 34

h-index: 3

Hao Li

Citations: 505

h-index: 9

Chaowei Xiao

Citations: 32

h-index: 3

Ning Zhang

Citations: 239

h-index: 6

간접 프롬프트 주입은 외부 콘텐츠에 악성 명령을 포함시켜 LLM 에이전트에 위협을 가하며, 이를 통해 무단 작업 수행 및 데이터 유출이 발생할 수 있습니다. LLM 에이전트는 의사 결정을 위해 컨텍스트 윈도우를 통해 작업 메모리를 유지하며, 기존 에이전트는 모든 도구 출력 및 추론 과정을 이 메모리에 무차별적으로 저장합니다. 이는 다음과 같은 두 가지 중요한 취약점을 야기합니다. (1) 주입된 명령은 전체 워크플로우를 통해 지속되어 공격자에게 행동을 조작할 수 있는 여러 기회를 제공하고, (2) 과도하고 불필요한 콘텐츠는 의사 결정 능력을 저하시킵니다. 기존의 방어 메커니즘은 이러한 과도한 메모리 문제를 해결하기보다는, 시스템이 공격에 강건하도록 만드는 데 초점을 맞춥니다. 본 논문에서는 명시적인 메모리 관리를 통해 간접 프롬프트 주입으로부터 보호하는 프레임워크인 AgentSys를 제시합니다. AgentSys는 운영 체제의 프로세스 메모리 격리 방식을 참고하여 에이전트를 계층적으로 구성합니다. 메인 에이전트는 도구 호출을 위해 워커 에이전트를 생성하며, 각 워커 에이전트는 격리된 컨텍스트에서 실행되고 하위 작업을 위한 중첩된 워커를 생성할 수 있습니다. 외부 데이터 및 하위 작업 추적 정보는 메인 에이전트의 메모리에 들어가지 않으며, 정해진 스키마에 따라 검증된 반환 값만이 결정적인 JSON 파싱을 통해 경계를 넘어 전달됩니다. 실험 결과, 격리만으로 공격 성공률을 2.19%까지 낮출 수 있으며, 검증기/정제기를 추가하면 이벤트 기반 검사를 통해 방어 성능을 더욱 향상시킬 수 있습니다. (이 검사의 오버헤드는 작업량에 따라 증가하며, 컨텍스트 길이에 비례하지 않습니다.) AgentSys는 AgentDojo 및 ASB 환경에서 각각 0.78% 및 4.25%의 공격 성공률을 보였으며, 방어되지 않은 기준 모델에 비해 유용성 측면에서도 약간의 개선을 보였습니다. 또한, AgentSys는 적응적인 공격에도 강건하며, 다양한 기반 모델에서도 일관된 성능을 보여줍니다. 이는 명시적인 메모리 관리가 안전하고 동적인 LLM 에이전트 아키텍처를 구축하는 데 중요한 역할을 한다는 것을 입증합니다. AgentSys의 코드는 다음 링크에서 확인할 수 있습니다: https://github.com/ruoyaow/agentsys-memory.

Original Abstract

Indirect prompt injection threatens LLM agents by embedding malicious instructions in external content, enabling unauthorized actions and data theft. LLM agents maintain working memory through their context window, which stores interaction history for decision-making. Conventional agents indiscriminately accumulate all tool outputs and reasoning traces in this memory, creating two critical vulnerabilities: (1) injected instructions persist throughout the workflow, granting attackers multiple opportunities to manipulate behavior, and (2) verbose, non-essential content degrades decision-making capabilities. Existing defenses treat bloated memory as given and focus on remaining resilient, rather than reducing unnecessary accumulation to prevent the attack. We present AgentSys, a framework that defends against indirect prompt injection through explicit memory management. Inspired by process memory isolation in operating systems, AgentSys organizes agents hierarchically: a main agent spawns worker agents for tool calls, each running in an isolated context and able to spawn nested workers for subtasks. External data and subtask traces never enter the main agent's memory; only schema-validated return values can cross boundaries through deterministic JSON parsing. Ablations show isolation alone cuts attack success to 2.19%, and adding a validator/sanitizer further improves defense with event-triggered checks whose overhead scales with operations rather than context length. On AgentDojo and ASB, AgentSys achieves 0.78% and 4.25% attack success while slightly improving benign utility over undefended baselines. It remains robust to adaptive attackers and across multiple foundation models, showing that explicit memory management enables secure, dynamic LLM agent architectures. Our code is available at: https://github.com/ruoyaow/agentsys-memory.

8 Citations

0 Influential

37.324746787308 Altmetric

194.6 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!