2606.13079v1 Jun 11, 2026 cs.CR

The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems

Geng Hong
Geng Hong
Citations: 364
h-index: 8
Xu Pan
Xu Pan
Citations: 127
h-index: 6
Jiarun Dai
Jiarun Dai
Citations: 265
h-index: 8
Min Yang
Min Yang
Citations: 128
h-index: 6
Jiaqi Luo
Jiaqi Luo
Citations: 26
h-index: 1
Zhile Chen
Zhile Chen
Citations: 0
h-index: 0
Yawen Duan
Yawen Duan
Citations: 398
h-index: 4
Brian Tse
Brian Tse
Citations: 409
h-index: 4
Jia Xu
Jia Xu
Citations: 62
h-index: 3
Weibing Wang
Weibing Wang
Citations: 2
h-index: 1
Yuan Zhang
Yuan Zhang
Citations: 47
h-index: 5

Nowadays, the autonomous execution of cyberattacks capable of causing substantial real-world harm is widely regarded as one of the critical red lines that frontier AI systems must not cross. Within this broader red-line scenario, autonomous penetration represents a core enabling capability and subtask: the ability of LLM-powered AI systems to independently conduct adversarial operations against a target server without human intervention, identify and exploit vulnerabilities, and obtain unauthorized access or control. A growing body of work has sought to assess the autonomous penetration capabilities of AI systems. However, existing evaluations often employ opaque methodologies, rely on unrealistic or overly simplified penetration-testing scenarios, or provide LLMs with excessive prior knowledge and task-specific guidance, and cannot accurately capture the extent to which modern AI systems can autonomously perform this core capability within broader high-impact cyberattack scenarios. To address these limitations, we construct a new autonomous penetration evaluation framework consisting of two components: target servers and agent scaffolding. Specifically, on the target-server side, we design two levels of target environments based on the number of secure services without known vulnerabilities deployed alongside a vulnerable service: Tier~1 (one secure service) and Tier~2 (three secure services), resulting in a total of 300 target servers. Meanwhile, the agent scaffolding adopts a general-purpose agent architecture equipped with a set of general-purpose cybersecurity tools, without any target-specific prior knowledge. We evaluate 19 open-weight and proprietary LLMs, and find that current models achieve penetration success rates ranging from 10.7% to 69.3%. Moreover, we observe that autonomous penetration capability continues to improve alongside advances in overall model capability.

0 Citations
0 Influential
4 Altmetric
20.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!