2602.07152v1 Feb 06, 2026 cs.CR

인공지능 트로이 목마 (TrojAI) 최종 보고서

Trojans in Artificial Intelligence (TrojAI) Final Report

Kristopher W. Reese

Citations: 15

h-index: 2

Taylor Kulp-McDowall

Citations: 1

h-index: 1

Michael Majurski

Citations: 540

h-index: 9

Timothy J. Blattner

Citations: 2

h-index: 1

Derek Juba

Citations: 2

h-index: 1

P. Bajcsy

Citations: 2,359

h-index: 23

A. Cardone

Citations: 534

h-index: 10

Philippe Dessauw

Citations: 83

h-index: 1

A. Dima

Citations: 869

h-index: 15

A. Kearsley

Citations: 65

h-index: 4

Melinda Kleczynski

Citations: 3

h-index: 1

Joel Vasanth

Citations: 102

h-index: 4

Walid Keyrouz

Citations: 413

h-index: 9

C. Ashcraft

Citations: 165

h-index: 7

Neil Fendley

Citations: 743

h-index: 9

Ted Staley

Citations: 2

h-index: 1

T. Stout

Citations: 425

h-index: 3

Josh Carney

Citations: 18

h-index: 2

Gregory H. Canal

Citations: 517

h-index: 8

Aurora Schmidt

Citations: 1

h-index: 1

Cameron Hickert

Citations: 2

h-index: 1

W. Paul

Citations: 487

h-index: 11

Jared Markowitz

Citations: 2

h-index: 1

N. Drenkow

Citations: 26

h-index: 3

David Shriver

Software Engineering Institute

Citations: 142

h-index: 7

Marissa Connor

Citations: 10

h-index: 2

Keltin Grimes

Citations: 27

h-index: 3

Marco Christiani

Citations: 10

h-index: 2

Hayden Moore

Penn State University, Carnegie Mellon University

Citations: 8

h-index: 2

Kasimir Gabert

Citations: 127

h-index: 6

Uma Balakrishnan

Citations: 32

h-index: 2

Satyanadh Gundimada

Citations: 213

h-index: 6

John Jacobellis

Citations: 18

h-index: 2

Sandya Lakkur

Citations: 31

h-index: 1

V. Leung

Citations: 1,690

h-index: 22

J. Roose

Citations: 4

h-index: 1

F. Koushanfar

Citations: 30,699

h-index: 69

G. Fields

Citations: 70

h-index: 3

Xihe Gu

Citations: 5

h-index: 1

Yaman Jandali

Citations: 11

h-index: 2

Xinqiao Zhang

Citations: 241

h-index: 9

Akash Vartak

Citations: 1

h-index: 1

Benjamin Erichson

Citations: 39

h-index: 2

Michael W. Mahoney

Citations: 916

h-index: 5

Rauf Izmailov

Citations: 55

h-index: 3

Xiangyu Zhang

Citations: 17

h-index: 2

Guangyu Shen

Citations: 1,471

h-index: 21

Si-Xuan Cheng

Citations: 3

h-index: 1

Shiqing Ma

Citations: 180

h-index: 7

Xiaofeng Wang

Citations: 33

h-index: 1

Haixu Tang

Citations: 18

h-index: 2

Di Tang

Citations: 663

h-index: 9

Xiaoyin Chen

Citations: 153

h-index: 5

Zihao Wang

Citations: 77

h-index: 3

Rui Zhu

Citations: 34

h-index: 1

Susmit Jha

Citations: 99

h-index: 5

Xiaowei Lin

Citations: 10

h-index: 1

Manoj Acharya

Citations: 191

h-index: 5

Wenchao Li

Citations: 6

h-index: 2

J. Widjaja

Citations: 22

h-index: 3

Tim Oates

Citations: 27

h-index: 3

Chaohao Chen

Citations: 19

h-index: 3

William T. Redman

Citations: 223

h-index: 8

Casey Battaglino

Citations: 394

h-index: 6

인텔리전스 고급 연구 프로젝트 활동 (IARPA)은 현대 인공지능에 존재하는 새로운 취약점, 즉 인공지능 트로이 목마의 위협에 대응하기 위해 TrojAI 프로그램을 시작했습니다. 이러한 인공지능 트로이 목마는 악의적인 숨겨진 백도어로, 인공지능 모델 내에 의도적으로 삽입되어 시스템이 예상치 못한 방식으로 오작동하도록 만들거나, 악의적인 사용자가 인공지능 모델을 마음대로 조작할 수 있도록 합니다. 이 다년간의 프로젝트는 위협의 복잡한 본질을 파악하고, 기본적인 탐지 방법을 개발했으며, 인공지능 보안 분야에서 지속적인 관심을 가져야 할 해결되지 않은 과제를 식별했습니다. 본 보고서는 프로그램의 주요 결과를 종합적으로 제시하며, 여기에는 가중치 분석 및 트리거 역전 방식을 통한 탐지 방법론뿐만 아니라, 배포된 모델에서 트로이 목마의 위험을 완화하기 위한 접근 방식도 포함됩니다. 종합적인 테스트 및 평가 결과는 탐지기의 성능, 민감도 및 "자연적인" 트로이 목마의 발생 빈도를 보여줍니다. 보고서는 교훈과 함께 인공지능 보안 연구를 발전시키기 위한 권장 사항으로 결론을 맺습니다.

Original Abstract

The Intelligence Advanced Research Projects Activity (IARPA) launched the TrojAI program to confront an emerging vulnerability in modern artificial intelligence: the threat of AI Trojans. These AI trojans are malicious, hidden backdoors intentionally embedded within an AI model that can cause a system to fail in unexpected ways, or allow a malicious actor to hijack the AI model at will. This multi-year initiative helped to map out the complex nature of the threat, pioneered foundational detection methods, and identified unsolved challenges that require ongoing attention by the burgeoning AI security field. This report synthesizes the program's key findings, including methodologies for detection through weight analysis and trigger inversion, as well as approaches for mitigating Trojan risks in deployed models. Comprehensive test and evaluation results highlight detector performance, sensitivity, and the prevalence of "natural" Trojans. The report concludes with lessons learned and recommendations for advancing AI security research.

1 Citations

0 Influential

30 Altmetric

151.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!