2605.14723v1 May 14, 2026 cs.AI

LLM 내 환자 상태 변화 모델링: 임상 환경 모델과의 상호작용을 통한 에이전트 개발

Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model

Xidong Wang

Citations: 1,064

h-index: 11

Shuang Li

Citations: 84

h-index: 2

Benyou Wang

Citations: 3

h-index: 1

Rongsheng Wang

Citations: 313

h-index: 5

Minghao Wu

Citations: 44

h-index: 2

Zhenyang Cai

Citations: 710

h-index: 7

Ke Ji

Citations: 836

h-index: 9

Yuting Yan

Citations: 6

h-index: 1

Chuangsen Fang

Citations: 14

h-index: 2

Ziying Sheng

Citations: 7

h-index: 1

Hejia Zhang

Citations: 511

h-index: 6

Hongyuan Zha

Citations: 81

h-index: 5

집중 치료실(ICU)에서의 패혈증 관리는 빠르게 변화하는 환자의 생리 상태에 따른 순차적인 치료 결정이 필요합니다. 대규모 언어 모델(LLM)은 광범위한 임상 지식을 포함하고 있으며, 가이드라인에 대한 추론이 가능하지만, 본질적으로는 행동에 따른 환자 상태 변화에 대한 이해가 부족합니다. 본 연구에서는 패혈증 치료 추천을 위한 세계 모델 기반 LLM 에이전트인 SepsisAgent를 소개합니다. SepsisAgent는 학습된 임상 세계 모델을 사용하여, 후보 수액 및 혈관 수축제 투여에 따른 환자의 반응을 시뮬레이션하며, 처방 전에 제안-시뮬레이션-수정 워크플로우를 따릅니다. 먼저, 세계 모델 접근만으로는 LLM의 의사 결정 성능이 일관되지 않다는 것을 보여주고, 이를 바탕으로 에이전트 특화된 학습의 필요성을 강조합니다. 이후, SepsisAgent는 세 단계의 커리큘럼을 통해 학습됩니다. 첫째, 환자 상태 변화에 대한 지도 학습을 통해 미세 조정하고, 둘째, 제안-시뮬레이션-수정 행동을 모방하도록 학습시키며, 셋째, 세계 모델 기반 강화 학습을 통해 에이전트의 능력을 향상시킵니다. MIMIC-IV 패혈증 데이터를 사용하여 SepsisAgent는 기존의 강화 학습 및 LLM 기반 모델들을 능가하는 성능을 보이며, 가이드라인 준수 및 위험 행동 지표에서 가장 안전한 결과를 제공합니다. 추가 분석 결과, 임상 세계 모델과의 반복적인 상호작용을 통해 에이전트는 환자의 변화 패턴을 학습하며, 시뮬레이터 접근이 제거된 상황에서도 이러한 학습된 패턴은 유용하게 활용될 수 있음을 확인했습니다.

Original Abstract

Sepsis management in the ICU requires sequential treatment decisions under rapidly evolving patient physiology. Although large language models (LLMs) encode broad clinical knowledge and can reason over guidelines, they are not inherently grounded in action-conditioned patient dynamics. We introduce SepsisAgent, a world model-augmented LLM agent for sepsis treatment recommendation. SepsisAgent uses a learned Clinical World Model to simulate patient responses under candidate fluid--vasopressor interventions, and follows a propose--simulate--refine workflow before committing to a prescription. We first show that world-model access alone yields inconsistent LLM decision performance, motivating agent-specific training. We then train SepsisAgent through a three-stage curriculum: patient-dynamics supervised fine-tuning, propose--simulate--refine behavior cloning, and world-model-based agentic reinforcement learning. On MIMIC-IV sepsis trajectories, SepsisAgent outperforms all traditional RL and LLM-based baselines in off-policy value while achieving the best safety profile under guideline adherence and unsafe-action metrics. Further analysis shows that repeated interaction with the Clinical World Model enables the agent to learn regularities in patient evolution, which remain useful even when simulator access is removed.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!