2604.17725v1 Apr 20, 2026 cs.CL

RePrompT: 재귀적 프롬프트 튜닝을 통한 구조화된 전자 건강 기록 인코더와 대규모 언어 모델 통합

RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models

Dongjie Wang

Citations: 13

h-index: 3

Arya Hadizadeh Moghaddam

Citations: 25

h-index: 1

Mohsen Nayebi Kerdabadi

Citations: 3

h-index: 1

D. Ross

Citations: 0

h-index: 0

Zijun Yao

Citations: 6

h-index: 1

대규모 언어 모델(LLM)은 전자 건강 기록(EHR)에서 장기간의 임상 정보를 활용하여 맥락이 풍부한 환자 경향을 파악함으로써 유망한 성능을 보여왔습니다. 그러나 구조화된 EHR(예: 표준화된 진단 및 약물 코드)에 LLM을 활용하는 데는 두 가지 주요 과제가 있습니다. 첫째, 시간 정보를 포함하는 EHR 시퀀스를 일반 텍스트로 변환하면 시간적 구조와 코드 식별 정보가 손실되어 코드의 동시 발생 및 장기적인 규칙을 파악하는 능력이 약화될 수 있습니다. 둘째, 환자 코호트 데이터를 사용하여 학습된 예측 모델은 환자 간에 공유되는, 작업에 맞게 조정된 표현 공간을 학습하는 반면, LLM은 종종 각 환자를 독립적으로 처리하는 사례별 추론 설정에서 사용되며, 인구 수준의 패턴을 활용하지 않습니다. 이러한 과제를 해결하기 위해, 우리는 구조화된 EHR 인코더를 프롬프트 튜닝을 통해 통합하는 시간 인지 LLM 프레임워크인 RePrompT를 소개합니다. 구체적으로, RePrompT는 이전 방문에서 얻은 잠재 상태를 재귀적으로 통합하여 장기적인 정보를 유지하고, 코호트 데이터를 사용하여 학습된, 작업에 맞게 조정된 EHR 인코더에서 파생된 학습 가능한 프롬프트 토큰을 통해 인구 수준의 정보를 주입합니다. MIMIC-III 및 MIMIC-IV 데이터셋에 대한 실험 결과, RePrompT는 다양한 임상 예측 작업에서 EHR 기반 및 LLM 기반의 기존 모델보다 일관되게 우수한 성능을 보였습니다.

Original Abstract

Large Language Models (LLMs) have shown strong promise for mining Electronic Health Records (EHRs) by reasoning over longitudinal clinical information to capture context-rich patient trajectories. However, leveraging LLMs for structured EHRs (e.g., standardized diagnosis and medication codes) presents two key challenges. First, translating time-stamped EHR sequences into plain text can obscure both temporal structure and code identities, weakening the ability to capture code co-occurrence and longitudinal regularities. Second, unlike cohort-trained predictive models that learn a shared, task-aligned representation space across patients, LLMs are often applied in a case-isolated inference setting where each patient is processed independently without leveraging population-level patterns. To address these challenges, we introduce RePrompT, a time-aware LLM framework that integrates structured EHR encoders through prompt tuning, without modifying underlying architectures. Specifically, RePrompT recurrently incorporates latent states from prior visits to preserve longitudinal information, and injects population-level information through trainable prompt tokens derived from a cohort-trained, task-aligned EHR encoder. Experiments on MIMIC-III and MIMIC-IV demonstrate that RePrompT consistently outperforms both EHR-based and LLM-based baselines across multiple clinical prediction tasks.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!