2604.05348v1 Apr 07, 2026 cs.AI

망막 증거 기반 안전 의사 결정: 의료 LLM의 환각 위험성 평가를 위한 RETINA-SAFE 및 ECRT

From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs

Meng Han

Citations: 227

h-index: 9

Wenpeng Xing

Citations: 205

h-index: 10

Zhenqiang Yu

Citations: 1

h-index: 1

의료 대규모 언어 모델(LLM)에서 발생하는 환각 현상은, 특히 이용 가능한 증거가 부족하거나 상반되는 경우, 심각한 안전 문제를 야기합니다. 본 연구에서는 당뇨망막병증(DR) 의사 결정 환경에서 이러한 문제를 다루고, 망막 등급 기록과 연계된 증거 기반 벤치마크인 RETINA-SAFE를 제안합니다. RETINA-SAFE는 12,522개의 샘플로 구성되어 있으며, 세 가지 증거 관계 태스크(E-Align: 증거 일관성, E-Conflict: 증거 상충, E-Gap: 증거 부족)로 구성됩니다. 또한, 본 연구에서는 두 단계의 투명한 탐지 프레임워크인 ECRT(Evidence-Conditioned Risk Triage)를 제안합니다. 1단계에서는 안전/위험 위험성을 분류하고, 2단계에서는 위험으로 분류된 사례를 모순 기반 위험과 증거 부족 위험으로 세분화합니다. ECRT는 CTX/NOCTX 조건 하에서 내부 표현과 로짓 변화를 활용하며, 균형 잡힌 학습을 위해 클래스 균형을 맞춘 학습을 사용합니다. 다양한 백본 모델에서 증거 그룹(환자 불연결)으로 분리된 데이터셋에서 ECRT는 강력한 1단계 위험성 분류 성능을 제공하며, 명확한 하위 유형 분류를 수행합니다. ECRT는 외부 불확실성 및 자기 일관성 기반 모델보다 1단계 균형 정확도를 +0.15에서 +0.19만큼 향상시키고, 가장 강력한 지도 학습 기반 모델보다 +0.02에서 +0.07만큼 향상시킵니다. 또한, ECRT는 1단계 균형 정확도에서 단일 단계의 투명한 분석 모델보다 일관되게 더 나은 성능을 보입니다. 이러한 결과는 망막 증거 기반의 투명한 내부 신호가 해석 가능한 의료 LLM 위험성 분류를 위한 실용적인 방법임을 시사합니다.

Original Abstract

Hallucinations in medical large language models (LLMs) remain a safety-critical issue, particularly when available evidence is insufficient or conflicting. We study this problem in diabetic retinopathy (DR) decision settings and introduce RETINA-SAFE, an evidence-grounded benchmark aligned with retinal grading records, comprising 12,522 samples. RETINA-SAFE is organized into three evidence-relation tasks: E-Align (evidence-consistent), E-Conflict (evidence-conflicting), and E-Gap (evidence-insufficient). We further propose ECRT (Evidence-Conditioned Risk Triage), a two-stage white-box detection framework: Stage 1 performs Safe/Unsafe risk triage, and Stage 2 refines unsafe cases into contradiction-driven versus evidence-gap risks. ECRT leverages internal representation and logit shifts under CTX/NOCTX conditions, with class-balanced training for robust learning. Under evidence-grouped (not patient-disjoint) splits across multiple backbones, ECRT provides strong Stage-1 risk triage and explicit subtype attribution, improves Stage-1 balanced accuracy by +0.15 to +0.19 over external uncertainty and self-consistency baselines and by +0.02 to +0.07 over the strongest adapted supervised baseline, and consistently exceeds a single-stage white-box ablation on Stage-1 balanced accuracy. These findings support white-box internal signals grounded in retinal evidence as a practical route to interpretable medical LLM risk triage.

0 Citations

0 Influential

5 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!