2603.24828v1 Mar 25, 2026 cs.LG

시계열 기반 심층 임상 예측 모델 해석을 위한 실용적인 가이드: 재현성 연구

A Practical Guide Towards Interpreting Time-Series Deep Clinical Predictive Models: A Reproducibility Study

John Wu

Citations: 17

h-index: 1

Jimeng Sun

Citations: 12

h-index: 2

Yong Fan

Citations: 23

h-index: 2

A. Fitzpatrick

Citations: 0

h-index: 0

Naveen Baskaran

Citations: 27

h-index: 2

Adam Cross

Citations: 187

h-index: 7

임상 의사 결정은 매우 중요한 문제이며 명확한 근거가 필요하기 때문에, 심층 임상 모델을 배포하기 전에 모델의 해석 가능성은 필수적입니다. 모델 아키텍처와 설명 가능성 방법이 확장됨에 따라 다음과 같은 중요한 질문들이 남아 있습니다: 어텐션과 같은 아키텍처적 특징이 설명 가능성을 향상시키는가? 해석 방법은 다양한 임상 작업에 걸쳐 일반화되는가? 기존의 벤치마킹 연구는 종종 확장성과 재현성이 부족하며, 더욱 중요한 점은 임상 작업과 모델 아키텍처의 상호 작용에 따른 설명 가능성의 변화를 체계적으로 검토하지 못합니다. 이러한 문제점을 해결하기 위해, 우리는 다양한 임상 예측 작업과 모델 아키텍처에 대한 해석 방법들을 종합적으로 평가하는 벤치마크를 제시합니다. 우리의 분석 결과는 다음과 같습니다: (1) 적절하게 활용된 어텐션은 모델 예측을 정확하게 해석하는 데 매우 효율적인 방법입니다; (2) KernelSHAP 및 LIME과 같은 블랙박스 인터프리터는 시계열 기반 임상 예측 작업에 대해 계산적으로 비실용적입니다; (3) 여러 해석 방법은 신뢰할 수 없을 정도로 신뢰성이 낮습니다. 이러한 결과를 바탕으로, 우리는 임상 예측 파이프라인 내에서 설명 가능성을 향상시키는 방법에 대한 몇 가지 지침을 제시합니다. 재현성과 확장성을 지원하기 위해, 우리는 PyHealth라는 잘 문서화된 오픈 소스 프레임워크를 통해 구현체를 제공합니다: https://github.com/sunlabuiuc/PyHealth.

Original Abstract

Clinical decisions are high-stakes and require explicit justification, making model interpretability essential for auditing deep clinical models prior to deployment. As the ecosystem of model architectures and explainability methods expands, critical questions remain: Do architectural features like attention improve explainability? Do interpretability approaches generalize across clinical tasks? While prior benchmarking efforts exist, they often lack extensibility and reproducibility, and critically, fail to systematically examine how interpretability varies across the interplay of clinical tasks and model architectures. To address these gaps, we present a comprehensive benchmark evaluating interpretability methods across diverse clinical prediction tasks and model architectures. Our analysis reveals that: (1) attention when leveraged properly is a highly efficient approach for faithfully interpreting model predictions; (2) black-box interpreters like KernelSHAP and LIME are computationally infeasible for time-series clinical prediction tasks; and (3) several interpretability approaches are too unreliable to be trustworthy. From our findings, we discuss several guidelines on improving interpretability within clinical predictive pipelines. To support reproducibility and extensibility, we provide our implementations via PyHealth, a well-documented open-source framework: https://github.com/sunlabuiuc/PyHealth.

0 Citations

0 Influential

23.5 Altmetric

117.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!