2602.08401v1 Feb 09, 2026 cs.AI

워터마킹을 이용한 에이전트 시스템의 지적 재산권 보호

On Protecting Agentic Systems' Intellectual Property via Watermarking

Liwen Wang

Citations: 12

h-index: 2

Dongdong She

Citations: 14

h-index: 2

Juergen Rahmel

Citations: 70

h-index: 2

Zongjie Li

Citations: 1,137

h-index: 21

Yuchong Xie

Citations: 242

h-index: 3

Shuai Wang

Citations: 12

h-index: 2

Wei Wang

Citations: 40

h-index: 3

자율 추론 및 도구 사용을 수행하는 에이전트 시스템으로 대규모 언어 모델(LLM)이 진화함에 따라 상당한 지적 재산(IP) 가치가 창출되었습니다. 우리는 이러한 시스템이 적대자가 피해 모델의 출력을 기반으로 모방 모델을 훈련시켜 독점적인 기능을 훔치는 모방 공격에 매우 취약하다는 것을 입증합니다. 결정적으로, 실제 에이전트 시스템은 검증에 필요한 내부 추론 과정을 숨기는 그레이 박스(grey box)로 작동하는 경우가 많기 때문에 기존 LLM 워터마킹 기술은 이 영역에서 효과적이지 않습니다. 본 논문은 에이전트 모델을 위해 특별히 설계된 최초의 워터마킹 프레임워크인 AGENTWM을 제안합니다. AGENTWM은 행동 시퀀스의 의미적 동등성을 활용하여, 기능적으로 동일한 도구 실행 경로의 분포를 미세하게 편향시킴으로써 워터마크를 주입합니다. 이 메커니즘을 통해 AGENTWM은 사용자에게는 식별되지 않으면서도 가시적인 행동 궤적에 직접 검증 가능한 신호를 삽입할 수 있습니다. 우리는 견고한 워터마크 체계를 생성하기 위한 자동화된 파이프라인과 검증을 위한 엄격한 통계적 가설 검정 절차를 개발했습니다. 세 가지 복잡한 도메인에 걸친 광범위한 평가를 통해, AGENTWM이 에이전트 성능에 미치는 영향을 최소화하면서 높은 탐지 정확도를 달성함을 입증했습니다. 우리의 연구 결과는 훔친 모델의 효용성을 심각하게 저하시키지 않고는 워터마크를 제거할 수 없는 적응형 적대자로부터 AGENTWM이 에이전트 IP를 효과적으로 보호함을 확인해 줍니다.

Original Abstract

The evolution of Large Language Models (LLMs) into agentic systems that perform autonomous reasoning and tool use has created significant intellectual property (IP) value. We demonstrate that these systems are highly vulnerable to imitation attacks, where adversaries steal proprietary capabilities by training imitation models on victim outputs. Crucially, existing LLM watermarking techniques fail in this domain because real-world agentic systems often operate as grey boxes, concealing the internal reasoning traces required for verification. This paper presents AGENTWM, the first watermarking framework designed specifically for agentic models. AGENTWM exploits the semantic equivalence of action sequences, injecting watermarks by subtly biasing the distribution of functionally identical tool execution paths. This mechanism allows AGENTWM to embed verifiable signals directly into the visible action trajectory while remaining indistinguishable to users. We develop an automated pipeline to generate robust watermark schemes and a rigorous statistical hypothesis testing procedure for verification. Extensive evaluations across three complex domains demonstrate that AGENTWM achieves high detection accuracy with negligible impact on agent performance. Our results confirm that AGENTWM effectively protects agentic IP against adaptive adversaries, who cannot remove the watermarks without severely degrading the stolen model's utility.

2 Citations

1 Influential

10.5 Altmetric

56.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!