2604.04978v1 Apr 04, 2026 cs.SE

퍼미션 게이트 측정: 클로드 코드의 자동 모드에 대한 스트레스 테스트 평가

Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode

Zongjie Li

Citations: 1,137

h-index: 21

Zimo Ji

Citations: 52

h-index: 3

Wenyuan Jiang

Citations: 40

h-index: 3

Yudong Gao

Citations: 34

h-index: 3

Shuai Wang

Citations: 422

h-index: 5

클로드 코드의 자동 모드는 AI 코딩 에이전트를 위한 최초의 배포된 권한 시스템으로, 위험한 도구 호출을 제한하기 위해 두 단계의 트랜스크립트 분류기를 사용합니다. Anthropic은 실제 트래픽에서 0.4%의 오탐율과 17%의 오탈율을 보고합니다. 본 연구에서는 의도적으로 모호하게 설계된 권한 시나리오에서 이 시스템에 대한 최초의 독립적인 평가를 수행했습니다. 즉, 사용자의 의도는 명확하지만, 대상 범위, 영향 범위 또는 위험 수준이 명확하게 정의되지 않은 작업입니다. AmPermBench라는 128개의 프롬프트로 구성된 벤치마크를 사용하여, 4가지 DevOps 작업 유형과 3가지 제어된 모호성 축을 포괄하며, 개별 작업 수준에서 253개의 상태 변경 작업을 오라클 정답과 비교하여 평가했습니다. 본 연구 결과는 스트레스 테스트 워크로드 하에서 자동 모드의 범위 확장 성능을 보여줍니다. 전체 오탈율은 81.0% (95% 신뢰 구간: 73.8%-87.4%)로, 이는 실제 트래픽에서 보고된 17%보다 훨씬 높으며, 이는 근본적으로 다른 워크로드의 차이를 반영합니다. 주목할 점은 전체 상태 변경 작업의 36.8%가 2단계 (프로젝트 내 파일 편집)를 통해 분류기의 범위 밖에 있다는 것입니다. 이는 높은 전체 오탈율의 원인입니다. 분류기가 실제로 평가하는 160개의 작업 (3단계)으로 제한하더라도, 오탈율은 70.3%로 유지되는 반면, 오탐율은 31.9%로 상승합니다. 2단계의 범위 부족은 특히 아티팩트 정리 작업 (92.9%의 오탈율)에서 두드러지며, 에이전트가 예상되는 CLI가 없을 때 자연스럽게 상태 파일을 편집하는 경우 발생합니다. 이러한 결과는 검토할 가치가 있는 범위 제한을 보여줍니다. 자동 모드는 위험한 작업이 셸을 통해 수행된다고 가정하지만, 에이전트는 분류기가 평가하지 않는 파일 편집을 통해 동일한 효과를 달성하는 경우가 많습니다.

Original Abstract

Claude Code's auto mode is the first deployed permission system for AI coding agents, using a two-stage transcript classifier to gate dangerous tool calls. Anthropic reports a 0.4% false positive rate and 17% false negative rate on production traffic. We present the first independent evaluation of this system on deliberately ambiguous authorization scenarios, i.e., tasks where the user's intent is clear but the target scope, blast radius, or risk level is underspecified. Using AmPermBench, a 128-prompt benchmark spanning four DevOps task families and three controlled ambiguity axes, we evaluate 253 state-changing actions at the individual action level against oracle ground truth. Our findings characterize auto mode's scope-escalation coverage under this stress-test workload. The end-to-end false negative rate is 81.0% (95% CI: 73.8%-87.4%), substantially higher than the 17% reported on production traffic, reflecting a fundamentally different workload rather than a contradiction. Notably, 36.8% of all state-changing actions fall outside the classifier's scope via Tier 2 (in-project file edits), contributing to the elevated end-to-end FNR. Even restricting to the 160 actions the classifier actually evaluates (Tier 3), the FNR remains 70.3%, while the FPR rises to 31.9%. The Tier 2 coverage gap is most pronounced on artifact cleanup (92.9% FNR), where agents naturally fall back to editing state files when the expected CLI is unavailable. These results highlight a coverage boundary worth examining: auto mode assumes dangerous actions transit the shell, but agents routinely achieve equivalent effects through file edits that the classifier does not evaluate.

3 Citations

1 Influential

10.5 Altmetric

57.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!