2602.01795v1 Feb 02, 2026 cs.CR

RedVisor: 추론 기반 프롬프트 인젝션 방어 기술 - 제로 카피 KV 캐시 재사용을 통한 접근

RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse

Mingrui Liu

Citations: 49

h-index: 3

Sixiao Zhang

Citations: 48

h-index: 2

Cheng Long

Citations: 48

h-index: 2

Kwok-Yan Lam

Citations: 74

h-index: 2

최근 대규모 언어 모델(LLM)은 프롬프트 인젝션(PI) 공격에 취약해지고 있으며, 이는 검색된 컨텍스트 내에 숨겨진 악성 명령어가 모델의 실행 흐름을 탈취하는 공격입니다. 현재의 방어 기법들은 일반적으로 중요한 트레이드오프를 가지고 있습니다. 예를 들어, 공격 방지를 위한 미세 조정은 '정렬 세금'으로 인해 일반적인 유용성이 저하되는 반면, 탐지 기반 필터링은 과도한 지연 시간과 메모리 비용을 발생시킵니다. 이러한 격차를 해소하기 위해, 우리는 설명 가능성을 갖춘 탐지 시스템과 다양한 방지 전략을 통합한 통합 프레임워크인 RedVisor를 제안합니다. RedVisor는 현재까지 알려진 바로는, 정밀한 추론 경로를 활용하여 공격을 동시에 탐지하고 모델의 안전한 응답을 유도하는 최초의 접근 방식입니다. 우리는 이 기능을 경량화된, 제거 가능한 어댑터를 사용하여 기존 모델에 적용했습니다. 이 어댑터는 두 가지 주요 기능을 수행합니다. 첫째, 어댑터는 주입 위치를 정확하게 파악하고 위협을 명확하게 설명하는 설명 가능한 분석을 생성합니다. 둘째, 이 분석은 모델이 악성 명령어를 거부하도록 명시적으로 조건을 부여합니다. 독특하게도, 어댑터는 추론 단계에서만 활성화되며, 이후 응답 생성 단계에서는 비활성화됩니다. 이러한 아키텍처는 다음과 같은 두 가지 장점을 제공합니다. (1) 백본 모델의 원래 유용성을 무결성 있게 유지하며, (2) 분리된 파이프라인에서 발생하는 중복적인 프리필 계산을 제거하는 새로운 KV 캐시 재사용 전략을 가능하게 합니다. 우리는 또한 이 방어 기술을 vLLM 서빙 엔진에 사용자 정의 커널과 함께 통합했습니다. 실험 결과, RedVisor는 최첨단 방어 기술보다 탐지 정확도와 처리량 측면에서 우수한 성능을 보이며, 동시에 유용성 손실은 미미한 수준입니다.

Original Abstract

Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. Current defenses typically face a critical trade-off: prevention-based fine-tuning often degrades general utility via the "alignment tax", while detection-based filtering incurs prohibitive latency and memory costs. To bridge this gap, we propose RedVisor, a unified framework that synthesizes the explainability of detection systems with the seamless integration of prevention strategies. To the best of our knowledge, RedVisor is the first approach to leverage fine-grained reasoning paths to simultaneously detect attacks and guide the model's safe response. We implement this via a lightweight, removable adapter positioned atop the frozen backbone. This adapter serves a dual function: it first generates an explainable analysis that precisely localizes the injection and articulates the threat, which then explicitly conditions the model to reject the malicious command. Uniquely, the adapter is active only during this reasoning phase and is effectively muted during the subsequent response generation. This architecture yields two distinct advantages: (1) it mathematically preserves the backbone's original utility on benign inputs; and (2) it enables a novel KV Cache Reuse strategy, eliminating the redundant prefill computation inherent to decoupled pipelines. We further pioneer the integration of this defense into the vLLM serving engine with custom kernels. Experiments demonstrate that RedVisor outperforms state-of-the-art defenses in detection accuracy and throughput while incurring negligible utility loss.

1 Citations

0 Influential

1.5 Altmetric

8.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!