2602.06613v1 Feb 06, 2026 cs.CV

DAVE: 분산 인지 기반 어트리뷰션 - ViT 그래디언트 분해를 통한 방법

DAVE: Distribution-aware Attribution via ViT Gradient Decomposition

Adam Wr'obel

Citations: 1

h-index: 1

Siddhartha Gairola

Citations: 294

h-index: 7

Jacek Tabor

Citations: 34

h-index: 4

B. Schiele

Citations: 94,533

h-index: 136

Bartosz Zieli'nski

Citations: 4

h-index: 1

Dawid Rymarczyk

Citations: 679

h-index: 11

비전 트랜스포머(ViT)는 컴퓨터 비전 분야에서 주류 아키텍처로 자리 잡았지만, 이러한 모델에 대해 안정적이고 고해상도의 어트리뷰션 맵을 생성하는 것은 여전히 어려운 과제입니다. 패치 임베딩 및 어텐션 라우팅과 같은 아키텍처 구성 요소는 종종 픽셀 수준의 설명에서 구조적인 왜곡을 유발하며, 이로 인해 많은 기존 방법들이 거친 패치 수준의 어트리뷰션에 의존합니다. 본 논문에서는 ViT에 대한 수학적으로 엄밀한 어트리뷰션 방법인 DAVE (Distribution-aware Attribution via ViT Gradient Decomposition)를 소개합니다. DAVE는 입력 그래디언트의 구조적 분해를 기반으로 하며, ViT의 아키텍처적 특성을 활용하여 입력-출력 매핑의 국소적으로 등변적이고 안정적인 구성 요소를 분리합니다. 이를 통해 아키텍처에 의해 유발되는 왜곡 및 기타 불안정성의 원인으로부터 분리합니다.

Original Abstract

Vision Transformers (ViTs) have become a dominant architecture in computer vision, yet producing stable and high-resolution attribution maps for these models remains challenging. Architectural components such as patch embeddings and attention routing often introduce structured artifacts in pixel-level explanations, causing many existing methods to rely on coarse patch-level attributions. We introduce DAVE \textit{(\underline{D}istribution-aware \underline{A}ttribution via \underline{V}iT Gradient D\underline{E}composition)}, a mathematically grounded attribution method for ViTs based on a structured decomposition of the input gradient. By exploiting architectural properties of ViTs, DAVE isolates locally equivariant and stable components of the effective input--output mapping. It separates these from architecture-induced artifacts and other sources of instability.

0 Citations

0 Influential

30 Altmetric

150.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!