2602.03586v1 Feb 03, 2026 cs.LG

APEX: 활성화 값 교란을 통한 신경망 분석

APEX: Probing Neural Networks via Activation Perturbation

Tao Ren

Citations: 68

h-index: 2

Xiaoyu Luo

Citations: 17

h-index: 2

Qiongxiu Li

Citations: 26

h-index: 3

신경망 분석 연구는 주로 입력 공간 분석 또는 파라미터 교란에 의존하며, 이는 중간 표현에 내재된 구조 정보를 접근하는 데 근본적인 한계를 갖습니다. 본 연구에서는 입력과 모델 파라미터를 고정하고 숨겨진 활성화 값을 교란하는 추론 시 분석 패러다임인 Activation Perturbation for EXploration (APEX)을 소개합니다. 이론적으로 활성화 값 교란은 입력에 특이적인 신호를 억제하고 표현 수준의 구조를 증폭시켜, 표본 의존적인 행동에서 모델 의존적인 행동으로의 체계적인 전환을 유도한다는 것을 보였습니다. 또한 입력 교란은 이 프레임워크의 제약된 특수한 경우에 해당한다는 것을 입증했습니다. 대표적인 사례 연구를 통해 APEX의 실용적인 장점을 입증했습니다. 작은 노이즈 환경에서 APEX는 기존 지표와 일치하는 가볍고 효율적인 표본 규칙성 측정 방법을 제공하며, 동시에 구조화된 모델과 무작위로 레이블링된 모델을 구별하고 의미적으로 일관된 예측 전환을 드러냅니다. 큰 노이즈 환경에서는 APEX가 학습으로 인해 발생한 모델 수준의 편향을 드러내며, 특히 백도어 모델에서 대상 클래스에 대한 예측이 집중되는 현상을 보여줍니다. 전반적으로, 본 연구의 결과는 APEX가 입력 공간만으로는 파악할 수 없는 신경망을 탐색하고 이해하는 데 효과적인 관점을 제공한다는 것을 보여줍니다.

Original Abstract

Prior work on probing neural networks primarily relies on input-space analysis or parameter perturbation, both of which face fundamental limitations in accessing structural information encoded in intermediate representations. We introduce Activation Perturbation for EXploration (APEX), an inference-time probing paradigm that perturbs hidden activations while keeping both inputs and model parameters fixed. We theoretically show that activation perturbation induces a principled transition from sample-dependent to model-dependent behavior by suppressing input-specific signals and amplifying representation-level structure, and further establish that input perturbation corresponds to a constrained special case of this framework. Through representative case studies, we demonstrate the practical advantages of APEX. In the small-noise regime, APEX provides a lightweight and efficient measure of sample regularity that aligns with established metrics, while also distinguishing structured from randomly labeled models and revealing semantically coherent prediction transitions. In the large-noise regime, APEX exposes training-induced model-level biases, including a pronounced concentration of predictions on the target class in backdoored models. Overall, our results show that APEX offers an effective perspective for exploring, and understanding neural networks beyond what is accessible from input space alone.

1 Citations

0 Influential

1.5 Altmetric

8.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!