2601.08703v1 Jan 13, 2026 cs.AI

라쇼몽 집합(Rashomon Set) 내 모델들을 구별하는 설명의 능력 평가

Evaluating the Ability of Explanations to Disambiguate Models in a Rashomon Set

Kai Rawal

Citations: 352

h-index: 3

Eoin Delaney

Citations: 36

h-index: 3

Zihao Fu

Citations: 27

h-index: 3

Chris Russell

Citations: 25

h-index: 3

Sandra Wachter

Citations: 10,918

h-index: 27

설명 가능한 인공지능(XAI)은 모델의 내부 작동 방식을 나타내는 설명을 생성하는 데 중점을 둡니다. 유사한 성능을 보이는 모델들의 라쇼몽 집합(Rashomon set)에 대해, 설명은 개별 모델의 동작을 명확히 구별할 수 있는 방법을 제공하여 배포할 모델 선택을 돕습니다. 그러나 설명 자체도 사용된 설명기(explainer)에 따라 달라질 수 있어 평가가 필요합니다. 본 논문 "정답(Ground Truth) 없는 모델 설명 평가"에서, 우리는 설명 평가의 세 가지 원칙과 특성 중요도 설명의 품질을 평가하기 위한 새로운 방법인 "AXE"를 제안했습니다. 나아가 우리는 모델 설명을 이상적인 정답 설명과 비교하는 것에 의존하는 평가 지표가 라쇼몽 집합 내의 동작 차이를 어떻게 모호하게 만드는지 보여줍니다. 우리가 제안한 원칙에 따른 설명 평가는 이러한 차이점을 부각시켜 라쇼몽 집합 내 모델 선택을 돕습니다. 라쇼몽 집합에서 대체 모델을 선택하면 동일한 예측을 유지하면서도 설명기가 거짓 설명을 생성하게 하고, 평가 방법이 이 거짓 설명을 고품질로 간주하도록 오도할 수 있습니다. 우리가 제안한 설명 평가 방법인 AXE는 이러한 설명의 적대적 페어워싱(adversarial fairwashing)을 100% 성공률로 탐지할 수 있습니다. 모델 민감도나 정답 비교에 기반한 기존의 설명 평가 전략과 달리, AXE는 보호 속성(protected attributes)이 예측에 사용되는 시점을 파악할 수 있습니다.

Original Abstract

Explainable artificial intelligence (XAI) is concerned with producing explanations indicating the inner workings of models. For a Rashomon set of similarly performing models, explanations provide a way of disambiguating the behavior of individual models, helping select models for deployment. However explanations themselves can vary depending on the explainer used, and need to be evaluated. In the paper "Evaluating Model Explanations without Ground Truth", we proposed three principles of explanation evaluation and a new method "AXE" to evaluate the quality of feature-importance explanations. We go on to illustrate how evaluation metrics that rely on comparing model explanations against ideal ground truth explanations obscure behavioral differences within a Rashomon set. Explanation evaluation aligned with our proposed principles would highlight these differences instead, helping select models from the Rashomon set. The selection of alternate models from the Rashomon set can maintain identical predictions but mislead explainers into generating false explanations, and mislead evaluation methods into considering the false explanations to be of high quality. AXE, our proposed explanation evaluation method, can detect this adversarial fairwashing of explanations with a 100% success rate. Unlike prior explanation evaluation strategies such as those based on model sensitivity or ground truth comparison, AXE can determine when protected attributes are used to make predictions.

0 Citations

0 Influential

13.5 Altmetric

67.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!