2604.17803v1 Apr 20, 2026 cs.AI

적대적 아레나: 인터랙티브 경쟁을 통한 데이터 생성 크라우드소싱

Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition

P. Goyal

Citations: 5,303

h-index: 11

Sattvik Sahai

Amazon

Citations: 59

h-index: 4

Michael Johnston

Citations: 17

h-index: 2

Hangjie Shi

Citations: 106

h-index: 5

Yao Lu

Citations: 89

h-index: 4

Shaohua Liu

Citations: 28

h-index: 3

Anna Rumshisky

Citations: 72

h-index: 2

Rahul Gupta

Citations: 6

h-index: 1

Anna Gottardi

Citations: 3,294

h-index: 9

Desheng Zhang

Citations: 43

h-index: 4

Lavina Vaz

Citations: 54

h-index: 4

Leslie Ball

Citations: 67

h-index: 3

Lucy Hu

Citations: 56

h-index: 3

Samyuth Sagi

Citations: 22

h-index: 2

Maureen Murray

Citations: 29

h-index: 2

Sankaranarayanan Ananthakrishnan

Citations: 857

h-index: 16

L. Dai

Citations: 108

h-index: 6

사후 훈련된 대규모 언어 모델(LLM)은 다양하고 고품질의 데이터를 필요로 하지만, 특히 자원이 부족한 분야 및 다중 턴 대화의 경우 이러한 데이터를 확보하는 것은 드물고 비용이 많이 듭니다. 일반적인 해결책은 크라우드소싱 또는 합성 데이터 생성이지만, 이 두 가지 방법 모두 종종 품질이 낮거나 다양성이 부족한 데이터를 생성하는 경우가 많습니다. 본 논문에서는 고품질 대화 데이터셋을 구축하기 위해 Adversarial Arena라는 방식을 소개합니다. 이 방식은 데이터 생성을 적대적인 작업으로 정의합니다. 공격자는 프롬프트를 생성하고, 방어자는 응답을 생성합니다. 이러한 여러 팀 간의 인터랙티브 경쟁은 자연스럽게 다양하고 복잡한 데이터를 생성합니다. 우리는 미국 및 유럽의 최고 대학에서 온 10개의 학술 팀으로 구성된 팀들이 공격자 또는 방어자 봇을 개발하는 경쟁을 통해 이 접근 방식을 검증했습니다. 사이버 보안 분야의 LLM 안전 정렬에 초점을 맞춘 이 경쟁은 19,683개의 다중 턴 대화를 생성했습니다. 이 데이터셋으로 파인 튜닝된 오픈 소스 모델은 CyberSecEval-Instruct에서 안전한 코드 생성 성능이 18.47% 향상되었고, CyberSecEval-MITRE에서 29.42% 향상되었습니다.

Original Abstract

Post-training Large Language Models requires diverse, high-quality data which is rare and costly to obtain, especially in low resource domains and for multi-turn conversations. Common solutions are crowdsourcing or synthetic generation, but both often yield low-quality or low-diversity data. We introduce Adversarial Arena for building high quality conversational datasets by framing data generation as an adversarial task: attackers create prompts, and defenders generate responses. This interactive competition between multiple teams naturally produces diverse and complex data. We validated this approach by conducting a competition with 10 academic teams from top US and European universities, each building attacker or defender bots. The competition, focused on safety alignment of LLMs in cybersecurity, generated 19,683 multi-turn conversations. Fine-tuning an open-source model on this dataset produced an 18.47% improvement in secure code generation on CyberSecEval-Instruct and 29.42% improvement on CyberSecEval-MITRE.

0 Citations

0 Influential

8 Altmetric

40.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!