2602.23947v1 Feb 27, 2026 cs.LG

계층적 개념 기반 해석 가능한 모델

Hierarchical Concept-based Interpretable Models

M. Jamnik

Citations: 3,049

h-index: 25

M. Zarlenga

Citations: 518

h-index: 11

O. Hill

Citations: 6

h-index: 2

현대의 딥 뉴럴 네트워크는 잠재적 표현의 불투명성으로 인해 해석하기 어렵기 때문에 모델 이해, 디버깅 및 편향 제거에 어려움을 겪습니다. 개념 임베딩 모델(CEM)은 입력을 인간이 이해할 수 있는 개념 표현으로 매핑하여 이를 통해 작업을 예측함으로써 이러한 문제를 해결합니다. 그러나 CEM은 개념 간의 관계를 표현하지 못하며, 훈련 과정에서 다양한 수준의 개념 주석이 필요하므로 적용 범위가 제한됩니다. 본 논문에서는 개념 관계를 계층적 구조를 통해 명시적으로 모델링하는 새로운 종류의 CEM인 계층적 개념 임베딩 모델(HiCEM)을 소개합니다. HiCEM을 실제 환경에서 활용하기 위해, 우리는 사전 훈련된 CEM의 임베딩 공간에서 추가적인 주석 없이 더 세분화된 하위 개념을 자동으로 발견하는 방법인 개념 분할(Concept Splitting)을 제안합니다. 이를 통해 HiCEM은 제한된 개념 레이블로부터 세분화된 설명을 생성할 수 있으며, 주석 작업 부담을 줄일 수 있습니다. 다양한 데이터 세트를 사용한 평가, 사용자 연구 및 3D 주방 렌더링 데이터 세트인 PseudoKitchens를 사용한 실험 결과, (1) 개념 분할은 훈련 중에 존재하지 않았던 인간이 이해할 수 있는 하위 개념을 발견하여 매우 정확한 HiCEM을 훈련하는 데 사용할 수 있으며, (2) HiCEM은 다양한 수준에서 강력한 테스트 시간 개념 개입을 가능하게 하여 작업 정확도를 향상시키는 것을 보여줍니다.

Original Abstract

Modern deep neural networks remain challenging to interpret due to the opacity of their latent representations, impeding model understanding, debugging, and debiasing. Concept Embedding Models (CEMs) address this by mapping inputs to human-interpretable concept representations from which tasks can be predicted. Yet, CEMs fail to represent inter-concept relationships and require concept annotations at different granularities during training, limiting their applicability. In this paper, we introduce Hierarchical Concept Embedding Models (HiCEMs), a new family of CEMs that explicitly model concept relationships through hierarchical structures. To enable HiCEMs in real-world settings, we propose Concept Splitting, a method for automatically discovering finer-grained sub-concepts from a pretrained CEM's embedding space without requiring additional annotations. This allows HiCEMs to generate fine-grained explanations from limited concept labels, reducing annotation burdens. Our evaluation across multiple datasets, including a user study and experiments on PseudoKitchens, a newly proposed concept-based dataset of 3D kitchen renders, demonstrates that (1) Concept Splitting discovers human-interpretable sub-concepts absent during training that can be used to train highly accurate HiCEMs, and (2) HiCEMs enable powerful test-time concept interventions at different granularities, leading to improved task accuracy.

2 Citations

0 Influential

12.5 Altmetric

64.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!