2604.18753v1 Apr 20, 2026 cs.LG

환자 임상 경과 데이터에서 누락된 정보 모드를 처리하고 해석하는 방법: 자기 회귀 시퀀스 모델링을 활용

Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling

Ritambhara Singh

Citations: 198

h-index: 9

A. Wang

Citations: 1

h-index: 1

Ellie Pavlick

Citations: 3,265

h-index: 11

의료 분야의 다중 모드 머신러닝(ML) 모델 개발에서 중요한 과제 중 하나는 학습 및 배포 과정에서 발생하는 누락된 정보 모드를 처리하는 것입니다. 임상 데이터 세트는 본질적으로 시간의 흐름에 따라 변하고, 정보 모드의 존재 여부가 불균일하기 때문에, 진단적인 다중 모드 ML 모델을 통해 잠재적인 예측 신호를 효과적으로 파악하면서 모델의 설명 가능성을 유지하는 것은 지속적인 과제입니다. 본 연구에서는 임상 진단을 자기 회귀 시퀀스 모델링 문제로 재정의하고, 대규모 언어 모델(LLM)의 인과적 디코더를 활용하여 환자의 다중 모드 경로를 모델링합니다. 먼저, 데이터 세트 내 누락된 정보 모드를 고려하는 대비 학습(contrastive pre-training) 방법을 제안하여, 다양한 정보 모드를 공유된 잠재 공간에서 통합합니다. 또한, 트랜스포머 기반 아키텍처를 사용한 자기 회귀 시퀀스 모델링이 MIMIC-IV 및 eICU 벤치마크에서 기존 방법보다 우수한 성능을 보임을 입증합니다. 마지막으로, 해석 가능성 기법을 사용하여 성능 향상뿐만 아니라, 다양한 환자 사례에서 정보 모드가 제거될 때 발생하는 다양한 현상을 분석하고, 제안하는 대비 학습 방법이 이러한 현상을 완화하는 것을 확인합니다. 본 연구는 임상 진단을 시퀀스 모델링으로 추상화하고 환자 경과 경로를 해석함으로써, 안전하고 투명한 임상 인공지능을 위한 프레임워크를 개발하고, 누락된 정보 모드를 효과적으로 처리하는 방법을 제시합니다.

Original Abstract

An active challenge in developing multimodal machine learning (ML) models for healthcare is handling missing modalities during training and deployment. As clinical datasets are inherently temporal and sparse in terms of modality presence, capturing the underlying predictive signal via diagnostic multimodal ML models while retaining model explainability remains an ongoing challenge. In this work, we address this by re-framing clinical diagnosis as an autoregressive sequence modeling task, utilizing causal decoders from large language models (LLMs) to model a patient's multimodal trajectory. We first introduce a missingness-aware contrastive pre-training objective that integrates multiple modalities in datasets with missingness in a shared latent space. We then show that autoregressive sequence modeling with transformer-based architectures outperforms baselines on the MIMIC-IV and eICU fine-tuning benchmarks. Finally, we use interpretability techniques to move beyond performance boosts and find that across various patient stays, removing modalities leads to divergent behavior that our contrastive pre-training mitigates. By abstracting clinical diagnosis as sequence modeling and interpreting patient stay trajectories, we develop a framework to profile and handle missing modalities while addressing the canonical desideratum of safe, transparent clinical AI.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!