2603.08639v1 Mar 09, 2026 cs.CV

UNBOX: 자연어 기반의 블랙박스 시각 모델 분석

UNBOX: Unveiling Black-box visual models with Natural-language

Quentin Bouniot

Citations: 111

h-index: 6

Zeynep Akata

Citations: 1,547

h-index: 21

M. Pennisi

Citations: 235

h-index: 8

C. Spampinato

Citations: 1,630

h-index: 18

Simone Carnemolla

Citations: 6

h-index: 2

C. Russo

Citations: 1

h-index: 1

S. Palazzo

Citations: 4

h-index: 1

Daniela Giordano

Citations: 461

h-index: 7

개방형 환경에서의 시각 인식 시스템의 신뢰성을 확보하기 위해서는 해석 가능하고, 공정하며, 데이터 분포 변화에 강건한 모델이 필요합니다. 그러나 현대적인 시각 시스템은 점점 더 독점적인 블랙박스 API로 배포되고 있으며, 이는 출력 확률만 제공하고 아키텍처, 파라미터, 기울기 및 학습 데이터를 숨깁니다. 이러한 불투명성은 의미 있는 감사, 편향 감지 및 오류 분석을 어렵게 만듭니다. 기존의 설명 방법은 일반적으로 내부 정보(화이트박스 또는 그레이박스)에 대한 접근이나 학습 데이터 분포에 대한 지식을 전제로 하기 때문에, 실제 환경에서는 사용하기 어렵습니다. 본 연구에서는 데이터, 기울기 및 역전파 과정에 대한 접근 없이도 모델을 분석할 수 있는 프레임워크인 UNBOX를 소개합니다. UNBOX는 대규모 언어 모델(LLM)과 텍스트-이미지 확산 모델을 활용하여 활성화 최적화를 출력 확률에 의해 주도되는 순수한 의미 검색으로 재구성합니다. 이 방법은 각 클래스를 최대한 활성화시키는 사람이 이해하기 쉬운 텍스트 설명을 생성하며, 모델이 암묵적으로 학습한 개념, 반영하는 학습 데이터 분포 및 잠재적인 편향의 원인을 드러냅니다. UNBOX는 ImageNet-1K, Waterbirds 및 CelebA 데이터셋에 대해 의미 충실성 테스트, 시각적 특징 상관 분석 및 슬라이스 발견 감사를 통해 평가되었습니다. 가장 엄격한 블랙박스 제약 조건 하에서도 UNBOX는 최첨단 화이트박스 해석 방법과 경쟁력 있는 성능을 보였습니다. 이는 내부 정보 없이도 모델의 내부 추론에 대한 의미 있는 통찰력을 얻을 수 있음을 보여주며, 이를 통해 더욱 신뢰할 수 있고 책임감 있는 시각 인식 시스템을 구축할 수 있습니다.

Original Abstract

Ensuring trustworthiness in open-world visual recognition requires models that are interpretable, fair, and robust to distribution shifts. Yet modern vision systems are increasingly deployed as proprietary black-box APIs, exposing only output probabilities and hiding architecture, parameters, gradients, and training data. This opacity prevents meaningful auditing, bias detection, and failure analysis. Existing explanation methods assume white- or gray-box access or knowledge of the training distribution, making them unusable in these real-world settings. We introduce UNBOX, a framework for class-wise model dissection under fully data-free, gradient-free, and backpropagation-free constraints. UNBOX leverages Large Language Models and text-to-image diffusion models to recast activation maximization as a purely semantic search driven by output probabilities. The method produces human-interpretable text descriptors that maximally activate each class, revealing the concepts a model has implicitly learned, the training distribution it reflects, and potential sources of bias. We evaluate UNBOX on ImageNet-1K, Waterbirds, and CelebA through semantic fidelity tests, visual-feature correlation analyses and slice-discovery auditing. Despite operating under the strictest black-box constraints, UNBOX performs competitively with state-of-the-art white-box interpretability methods. This demonstrates that meaningful insight into a model's internal reasoning can be recovered without any internal access, enabling more trustworthy and accountable visual recognition systems.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!