2604.08502v1 Apr 09, 2026 cs.CV

설명 일관성 정량화: 의료 영상 분류에서 CAM 기반 설명 가능성을 위한 C-Score 지표

Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification

Citations: 3,623

h-index: 12

Citations: 3,266

h-index: 3

클래스 활성화 매핑(CAM) 방법은 의료 영상에서 딥러닝 분류기의 시각적 설명을 생성하는 데 널리 사용됩니다. 그러나 기존 평가 프레임워크는 설명의 정확성을 평가하는데, 이는 방사선과 의사의 주석과 비교한 지역화 충실도로 측정됩니다. 대신, 모델이 동일한 병리를 가진 다양한 환자에게 동일한 공간적 추론 전략을 적용하는지, 즉 설명의 일관성을 평가하는 것이 중요합니다. 본 연구에서는 C-Score(일관성 점수)라는 새로운 지표를 제안합니다. C-Score는 정답으로 분류된 인스턴스 간의 강도 강조된 쌍별 소프트 IoU를 사용하여, 주석 없이 클래스 내 설명의 재현성을 정량화하는 신뢰도 가중 지표입니다. 본 연구에서는 GradCAM, GradCAM++, LayerCAM, EigenCAM, ScoreCAM, MS GradCAM++의 6가지 CAM 기법을 DenseNet201, InceptionV3, ResNet50V2의 3가지 CNN 아키텍처에 적용하고, Kermany 흉부 X-ray 데이터셋에 대해 30개의 학습 에포크 동안 전이 학습 및 미세 조정 단계를 거쳐 평가했습니다. 표준 분류 지표로는 감지하기 어려운 세 가지 AUC 일관성 분리 메커니즘을 식별했습니다. 이러한 메커니즘은 임계값에 의해 유발되는 이상치 제거, 최고 AUC에서 기법별 속성 집중 감소, 그리고 전역 집계에서의 클래스 수준 일관성 마스킹입니다. C-Score는 모델 불안정성의 조기 경고 신호를 제공합니다. ResNet50V2에서 ScoreCAM의 성능 저하는 재앙적인 AUC 붕괴 발생 1단계 전에 감지할 수 있으며, 예측 순위뿐만 아니라 설명 품질을 기반으로 한 아키텍처별 임상 적용 권장 사항을 제공합니다.

Original Abstract

Class Activation Mapping (CAM) methods are widely used to generate visual explanations for deep learning classifiers in medical imaging. However, existing evaluation frameworks assess whether explanations are correct, measured by localisation fidelity against radiologist annotations, rather than whether they are consistent: whether the model applies the same spatial reasoning strategy across different patients with the same pathology. We propose the C-Score (Consistency Score), a confidence-weighted, annotation-free metric that quantifies intra-class explanation reproducibility via intensity-emphasised pairwise soft IoU across correctly classified instances. We evaluate six CAM techniques: GradCAM, GradCAM++, LayerCAM, EigenCAM, ScoreCAM, and MS GradCAM++ across three CNN architectures (DenseNet201, InceptionV3, ResNet50V2) over thirty training epochs on the Kermany chest X-ray dataset, covering transfer learning and fine-tuning phases. We identify three distinct mechanisms of AUC-consistency dissociation, invisible to standard classification metrics: threshold-mediated gold list collapse, technique-specific attribution collapse at peak AUC, and class-level consistency masking in global aggregation. C-Score provides an early warning signal of impending model instability. ScoreCAM deterioration on ResNet50V2 is detectable one full checkpoint before catastrophic AUC collapse and yields architecture-specific clinical deployment recommendations grounded in explanation quality rather than predictive ranking alone.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!