2605.07830v1 May 08, 2026 cs.CR

CyBiasBench: 사이버 공격 시나리오에서 LLM 에이전트의 편향성 벤치마킹

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

T. Lim

Citations: 2

h-index: 1

S. Ju

Citations: 2

h-index: 1

Mun-Kyom Kim

Citations: 0

h-index: 0

Hyunjun Kim

Citations: 15

h-index: 2

Hoki Kim

Citations: 312

h-index: 10

대규모 언어 모델(LLM)은 공격적인 사이버 보안 분야에서 자율 에이전트로 점점 더 많이 사용되고 있습니다. 본 연구에서는 흥미로운 현상을 밝혀냈습니다. 바로 서로 다른 에이전트가 뚜렷한 공격 패턴을 보인다는 것입니다. 구체적으로, 각 에이전트는 공격 선택 편향을 나타내며, 프롬프트의 변화에 관계없이 공격 패밀리의 제한된 부분에 불균형적으로 집중하는 경향이 있습니다. 이러한 현상을 체계적으로 정량화하기 위해, 우리는 5개의 에이전트를 3개의 대상에 대해, 4가지 프롬프트 조건과 10개의 공격 패밀리를 사용하여 평가하는 630세션의 종합적인 벤치마크인 CyBiasBench를 소개합니다. 우리는 에이전트 간의 명확한 편향성을 확인했으며, 각 에이전트마다 주요 공격 패밀리가 다르고 공격 패밀리 할당 분포의 엔트로피 수준도 다릅니다. 이러한 편향성은 공격 성공률과 관련된 요인이라기보다는 에이전트의 고유한 특징으로 더 잘 설명됩니다. 또한, 우리의 실험 결과는 에이전트가 자신의 편향과 충돌하는 공격 패밀리로의 명시적인 유도에 저항하는 '편향 모멘텀 효과'를 나타냄을 보여줍니다. 이러한 강제적인 분포 변화는 공격 성능에 측정 가능한 개선을 가져오지 않습니다. 재현성을 보장하고 향후 연구를 촉진하기 위해, 우리는 상호 작용 가능한 결과 대시보드를 https://trustworthyai.co.kr/CyBiasBench/ 에서 제공하며, 세션 수준의 통계 정보와 전체 평가 스크립트가 포함된 재현성 자료를 https://github.com/Harry24k/CyBiasBench/ 에서 공개합니다.

Original Abstract

Large language models (LLMs) are increasingly deployed as autonomous agents in offensive cybersecurity. In this paper, we reveal an interesting phenomenon: different agents exhibit distinct attack patterns. Specifically, each agent exhibits an attack-selection bias, disproportionately concentrating its efforts on a narrow subset of attack families regardless of prompt variations. To systematically quantify this behavior, we introduce CyBiasBench, a comprehensive 630-session benchmark that evaluates five agents on three targets and four prompt conditions with ten attack families. We identify explicit bias across agents, with different dominant attack families and varying entropy levels in their attack-family allocation distributions. Such bias is better characterized as a trait of the agents, rather than a factor associated with the attack success rate. Furthermore, our experiments reveal a bias momentum effect, where agents resist explicit steering toward attack families that conflict with their bias. This forced distribution shift does not yield measurable improvements in attack performance. To ensure reproducibility and facilitate future research, we release an interactive result dashboard at https://trustworthyai.co.kr/CyBiasBench/ and a reproducibility artifact with aggregated session-level statistics and full evaluation scripts at https://github.com/Harry24k/CyBiasBench.

0 Citations

0 Influential

25 Altmetric

125.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!