2604.03081v1 Apr 03, 2026 cs.CR

LLM 코딩 에이전트 기술 생태계에 대한 공급망 공격

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Yi Liu

Citations: 713

h-index: 11

Gelei Deng

Citations: 3,705

h-index: 25

L. Zhang

Citations: 101

h-index: 4

Ying Zhang

Citations: 262

h-index: 4

Yuekang Li

Citations: 19

h-index: 2

Yubin Qu

Citations: 26

h-index: 2

Tongcheng Geng

Citations: 26

h-index: 2

Lei Ma

Citations: 38

h-index: 3

LLM 기반 코딩 에이전트는 서드파티 에이전트 기술을 오픈 마켓플레이스를 통해 확장하며, 이러한 기술들은 필수의 보안 검토 없이 배포됩니다. 기존 패키지와 달리, 이러한 기술들은 시스템 수준의 권한을 가진 운영 지침으로 실행되므로, 단 하나의 악성 기술이라도 호스트 시스템을 손상시킬 수 있습니다. 기존 연구에서는 기존의 보안 장치에도 불구하고, 공급망 공격이 파일 쓰기, 셸 명령어, 네트워크 요청과 같은 에이전트의 동작 공간을 직접적으로 제어할 수 있는지 여부에 대한 검토가 이루어지지 않았습니다. 본 연구에서는 코드 예제 및 설정 템플릿 내의 기술 문서에 악성 로직을 삽입하는 Document-Driven Implicit Payload Execution (DDIPE) 기법을 소개합니다. 에이전트는 이러한 예제를 정상적인 작업 중에 재사용하므로, 악성 로직은 명시적인 프롬프트 없이 실행됩니다. LLM 기반 파이프라인을 사용하여 15개의 MITRE ATTACK 범주에 걸쳐 81개의 시드로부터 1,070개의 적대적인 기술을 생성했습니다. 4개의 프레임워크와 5개의 모델에서 DDIPE는 11.6%에서 33.5%의 우회율을 달성했으며, 명시적인 지시 공격은 강력한 방어 하에서 0%의 우회율을 보였습니다. 정적 분석은 대부분의 경우를 탐지했지만, 2.5%는 탐지 및 완화 모두를 회피했습니다. 책임 있는 공개를 통해 4개의 확인된 취약점이 발견되었으며, 이에 대한 2개의 수정 사항이 적용되었습니다.

Original Abstract

LLM-based coding agents extend their capabilities via third-party agent skills distributed through open marketplaces without mandatory security review. Unlike traditional packages, these skills are executed as operational directives with system-level privileges, so a single malicious skill can compromise the host. Prior work has not examined whether supply-chain attacks can directly hijack an agent's action space, such as file writes, shell commands, and network requests, despite existing safeguards. We introduce Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic in code examples and configuration templates within skill documentation. Because agents reuse these examples during normal tasks, the payload executes without explicit prompts. Using an LLM-driven pipeline, we generate 1,070 adversarial skills from 81 seeds across 15 MITRE ATTACK categories. Across four frameworks and five models, DDIPE achieves 11.6% to 33.5% bypass rates, while explicit instruction attacks achieve 0% under strong defenses. Static analysis detects most cases, but 2.5% evade both detection and alignment. Responsible disclosure led to four confirmed vulnerabilities and two fixes.

12 Citations

2 Influential

12.5 Altmetric

78.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!