2603.07343v1 Mar 07, 2026 cs.LG

메커니즘 기반 설명으로부터 개념 병목 모델 학습

Learning Concept Bottleneck Models from Mechanistic Explanations

Antonio De Santis

Politecnico di Milano

Citations: 25

h-index: 3

Schrasing Tong

Citations: 240

h-index: 5

Marco Brambilla

Citations: 17

h-index: 2

Lalana Kagal

Citations: 8,898

h-index: 41

개념 병목 모델(CBM)은 의사 결정 전에 해석 가능한 개념을 예측하는 병목 계층을 학습하여 사전 해석 가능성을 목표로 합니다. 최첨단 방법은 일반적으로 인간의 명시, 공개 지식 그래프, LLM 프롬프트 또는 일반 CLIP 개념을 사용하여 학습할 개념을 선택합니다. 그러나 사전에 정의된 개념이 특정 작업에 충분한 예측력을 갖지 못하거나, 심지어 사용 가능한 데이터로부터 학습 불가능할 수도 있습니다. 그 결과, 이러한 CBM은 정보 누수를 제어한 상태에서 블랙박스 모델에 비해 성능이 현저히 떨어지는 경우가 많습니다. 이러한 문제를 해결하기 위해, 우리는 블랙박스 모델 자체에서 학습된 개념을 기반으로 병목 계층을 직접 구축하는 새로운 CBM 파이프라인인 '메커니즘 CBM (M-CBM)'을 소개합니다. 이러한 개념은 희소 오토인코더(SAE)를 통해 추출되고, 이후 다중 모드 LLM을 사용하여 선택된 이미지의 부분 집합에서 이름이 지정되고 주석이 추가됩니다. 공정한 비교와 정보 누수 제어를 위해, 최근 제안된 NEC 메트릭을 확장한 의사 결정 수준의 희소성 메트릭인 '참여 개념 수(Number of Contributing Concepts, NCC)'를 소개합니다. 다양한 데이터 세트를 통해, M-CBM이 이전 CBM보다 일관되게 우수한 성능을 보이며, 개념 예측 성능을 향상시키고 간결한 설명을 제공하는 것을 보여줍니다. 저희의 코드는 https://github.com/Antonio-Dee/M-CBM 에서 확인할 수 있습니다.

Original Abstract

Concept Bottleneck Models (CBMs) aim for ante-hoc interpretability by learning a bottleneck layer that predicts interpretable concepts before the decision. State-of-the-art approaches typically select which concepts to learn via human specification, open knowledge graphs, prompting an LLM, or using general CLIP concepts. However, concepts defined a-priori may not have sufficient predictive power for the task or even be learnable from the available data. As a result, these CBMs often significantly trail their black-box counterpart when controlling for information leakage. To address this, we introduce a novel CBM pipeline named Mechanistic CBM (M-CBM), which builds the bottleneck directly from a black-box model's own learned concepts. These concepts are extracted via Sparse Autoencoders (SAEs) and subsequently named and annotated on a selected subset of images using a Multimodal LLM. For fair comparison and leakage control, we also introduce the Number of Contributing Concepts (NCC), a decision-level sparsity metric that extends the recently proposed NEC metric. Across diverse datasets, we show that M-CBMs consistently surpass prior CBMs at matched sparsity, while improving concept predictions and providing concise explanations. Our code is available at https://github.com/Antonio-Dee/M-CBM.

0 Citations

0 Influential

40.5 Altmetric

202.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!