2602.07152v1 Feb 06, 2026 cs.CR

인공지능 트로이 목마 (TrojAI) 최종 보고서

Trojans in Artificial Intelligence (TrojAI) Final Report

Kristopher W. Reese
Kristopher W. Reese
Citations: 15
h-index: 2
Taylor Kulp-McDowall
Taylor Kulp-McDowall
Citations: 1
h-index: 1
Michael Majurski
Michael Majurski
Citations: 540
h-index: 9
Timothy J. Blattner
Timothy J. Blattner
Citations: 2
h-index: 1
Derek Juba
Derek Juba
Citations: 2
h-index: 1
P. Bajcsy
P. Bajcsy
Citations: 2,359
h-index: 23
A. Cardone
A. Cardone
Citations: 534
h-index: 10
Philippe Dessauw
Philippe Dessauw
Citations: 83
h-index: 1
A. Dima
A. Dima
Citations: 869
h-index: 15
A. Kearsley
A. Kearsley
Citations: 65
h-index: 4
Melinda Kleczynski
Melinda Kleczynski
Citations: 3
h-index: 1
Joel Vasanth
Joel Vasanth
Citations: 102
h-index: 4
Walid Keyrouz
Walid Keyrouz
Citations: 413
h-index: 9
C. Ashcraft
C. Ashcraft
Citations: 165
h-index: 7
Neil Fendley
Neil Fendley
Citations: 743
h-index: 9
Ted Staley
Ted Staley
Citations: 2
h-index: 1
T. Stout
T. Stout
Citations: 425
h-index: 3
Josh Carney
Josh Carney
Citations: 18
h-index: 2
Gregory H. Canal
Gregory H. Canal
Citations: 517
h-index: 8
Aurora Schmidt
Aurora Schmidt
Citations: 1
h-index: 1
Cameron Hickert
Cameron Hickert
Citations: 2
h-index: 1
W. Paul
W. Paul
Citations: 487
h-index: 11
Jared Markowitz
Jared Markowitz
Citations: 2
h-index: 1
N. Drenkow
N. Drenkow
Citations: 26
h-index: 3
David Shriver
David Shriver
Software Engineering Institute
Citations: 142
h-index: 7
Marissa Connor
Marissa Connor
Citations: 10
h-index: 2
Keltin Grimes
Keltin Grimes
Citations: 27
h-index: 3
Marco Christiani
Marco Christiani
Citations: 10
h-index: 2
Hayden Moore
Hayden Moore
Penn State University, Carnegie Mellon University
Citations: 8
h-index: 2
Kasimir Gabert
Kasimir Gabert
Citations: 127
h-index: 6
Uma Balakrishnan
Uma Balakrishnan
Citations: 32
h-index: 2
Satyanadh Gundimada
Satyanadh Gundimada
Citations: 213
h-index: 6
John Jacobellis
John Jacobellis
Citations: 18
h-index: 2
Sandya Lakkur
Sandya Lakkur
Citations: 31
h-index: 1
V. Leung
V. Leung
Citations: 1,690
h-index: 22
J. Roose
J. Roose
Citations: 4
h-index: 1
F. Koushanfar
F. Koushanfar
Citations: 30,699
h-index: 69
G. Fields
G. Fields
Citations: 70
h-index: 3
Xihe Gu
Xihe Gu
Citations: 5
h-index: 1
Yaman Jandali
Yaman Jandali
Citations: 11
h-index: 2
Xinqiao Zhang
Xinqiao Zhang
Citations: 241
h-index: 9
Akash Vartak
Akash Vartak
Citations: 1
h-index: 1
Benjamin Erichson
Benjamin Erichson
Citations: 39
h-index: 2
Michael W. Mahoney
Michael W. Mahoney
Citations: 916
h-index: 5
Rauf Izmailov
Rauf Izmailov
Citations: 55
h-index: 3
Xiangyu Zhang
Xiangyu Zhang
Citations: 17
h-index: 2
Guangyu Shen
Guangyu Shen
Citations: 1,471
h-index: 21
Si-Xuan Cheng
Si-Xuan Cheng
Citations: 3
h-index: 1
Shiqing Ma
Shiqing Ma
Citations: 180
h-index: 7
Xiaofeng Wang
Xiaofeng Wang
Citations: 33
h-index: 1
Haixu Tang
Haixu Tang
Citations: 18
h-index: 2
Di Tang
Di Tang
Citations: 663
h-index: 9
Xiaoyin Chen
Xiaoyin Chen
Citations: 153
h-index: 5
Zihao Wang
Zihao Wang
Citations: 77
h-index: 3
Rui Zhu
Rui Zhu
Citations: 34
h-index: 1
Susmit Jha
Susmit Jha
Citations: 99
h-index: 5
Xiaowei Lin
Xiaowei Lin
Citations: 10
h-index: 1
Manoj Acharya
Manoj Acharya
Citations: 191
h-index: 5
Wenchao Li
Wenchao Li
Citations: 6
h-index: 2
J. Widjaja
J. Widjaja
Citations: 22
h-index: 3
Tim Oates
Tim Oates
Citations: 27
h-index: 3
Chaohao Chen
Chaohao Chen
Citations: 19
h-index: 3
William T. Redman
William T. Redman
Citations: 223
h-index: 8
Casey Battaglino
Casey Battaglino
Citations: 394
h-index: 6

인텔리전스 고급 연구 프로젝트 활동 (IARPA)은 현대 인공지능에 존재하는 새로운 취약점, 즉 인공지능 트로이 목마의 위협에 대응하기 위해 TrojAI 프로그램을 시작했습니다. 이러한 인공지능 트로이 목마는 악의적인 숨겨진 백도어로, 인공지능 모델 내에 의도적으로 삽입되어 시스템이 예상치 못한 방식으로 오작동하도록 만들거나, 악의적인 사용자가 인공지능 모델을 마음대로 조작할 수 있도록 합니다. 이 다년간의 프로젝트는 위협의 복잡한 본질을 파악하고, 기본적인 탐지 방법을 개발했으며, 인공지능 보안 분야에서 지속적인 관심을 가져야 할 해결되지 않은 과제를 식별했습니다. 본 보고서는 프로그램의 주요 결과를 종합적으로 제시하며, 여기에는 가중치 분석 및 트리거 역전 방식을 통한 탐지 방법론뿐만 아니라, 배포된 모델에서 트로이 목마의 위험을 완화하기 위한 접근 방식도 포함됩니다. 종합적인 테스트 및 평가 결과는 탐지기의 성능, 민감도 및 "자연적인" 트로이 목마의 발생 빈도를 보여줍니다. 보고서는 교훈과 함께 인공지능 보안 연구를 발전시키기 위한 권장 사항으로 결론을 맺습니다.

Original Abstract

The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.

1 Citations
0 Influential
30 Altmetric
151.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!