2604.23054v1 Apr 24, 2026 cs.CL

DeepImagine: 연속적인 반사실적 추론을 통한 생의학적 추론 학습

DeepImagine: Learning Biomedical Reasoning via Successive Counterfactual Imagining

Jianyou Wang

Citations: 202

h-index: 5

Youze Zheng

UC San Diego

Citations: 10

h-index: 2

Longtian Bao

UC San Diego

Citations: 10

h-index: 2

Maxim Khan

Citations: 4

h-index: 1

A. Sehgal

Citations: 176

h-index: 7

Christopher D. Rosin

Citations: 11

h-index: 3

R. Paturi

Citations: 6,418

h-index: 30

U. Dube

Citations: 3

h-index: 1

Yuhan Chen

Citations: 0

h-index: 0

M. Feng

Citations: 40

h-index: 3

Han Zhang

Citations: 11

h-index: 2

대규모 언어 모델(LLM)은 잠재적인 임상 시험 결과를 예측하는 데 있어 여전히 큰 어려움을 겪고 있습니다. 기존 연구에 따르면, 랜덤 포레스트 및 로지스틱 회귀와 같은 전통적인 상관관계 예측 모델과 강력한 상용 LLM 모두 이 작업에서 제한적인 성능을 보입니다. 본 논문에서는 LLM에게 연속적인 반사실적 추론을 통해 생의학적 추론 능력을 학습시키는 프레임워크인 DeepImagine을 제안합니다. 핵심 아이디어는 모델을 훈련하여 투여량, 결과 측정, 연구 그룹, 지리, 기타 시험 속성과 같은 실험 조건의 제어된 변화 하에서 관찰된 시험 결과가 어떻게 변하는지 추론함으로써 임상 시험의 숨겨진 인과 메커니즘을 근사하는 것입니다. 이 목표를 지원하기 위해, 보고된 결과가 있는 실제 임상 시험에서 자연적 및 근사적 반사실 쌍을 구성합니다. 엄격한 반사실적 감독이 가능한 경우(예: 동일한 시험 내의 쌍을 이루는 결과 측정 또는 용량 범위를 갖는 연구 그룹), 우리는 지도 학습을 통해 모델을 훈련합니다. 더 넓은 범위의 설정에서, 근사적 반사실 쌍만 검색할 수 있는 경우, 다운스트림 벤치마크 정확도를 기반으로 하는 검증 가능한 보상을 사용하여 강화 학습을 통해 모델을 최적화합니다. 또한, 국소적인 반사실 전환에 대한 인과적으로 타당한 설명을 제공하는 합성 추론 트레이스를 사용하여 훈련을 보강합니다. 이 파이프라인을 사용하여 100억 개 이하의 파라미터를 가진 언어 모델(Qwen3.5-9B 포함)을 훈련하고, 임상 시험 결과 예측에 대한 성능을 평가합니다. DeepImagine이 미튜닝된 언어 모델 및 기존의 상관관계 기반 모델보다 일관되게 성능이 향상됨을 보여주는 것을 목표로 합니다. 마지막으로, 학습된 추론 경로가 모델이 시험 수준의 메커니즘을 어떻게 표현하는지에 대한 해석 가능한 신호를 제공하며, 보다 기계적이고 과학적으로 유용한 생의학 언어 모델을 개발하기 위한 실질적인 경로를 제시할 수 있음을 보여주고자 합니다.

Original Abstract

Predicting the outcomes of prospective clinical trials remains a major challenge for large language models. Prior work has shown that both traditional correlational predictors, such as random forests and logistic regression, and strong commercial LLMs achieve limited performance on this task. In this paper, we propose DeepImagine, a framework for teaching LLMs biomedical reasoning through successive counterfactual imagining. The central idea is to approximate hidden causal mechanisms of clinical trials by training models to infer how observed trial results would change under controlled perturbations of experimental conditions, such as dosage, outcome measures, study arms, geography, and other trial attributes. To support this objective, we construct both natural and approximate counterfactual pairs from real clinical trials with reported outcomes. For settings where strict counterfactual supervision is available, such as paired outcome measures or dose-ranging study arms within the same trial, we train models with supervised fine-tuning. For broader settings where only approximate counterfactual pairs can be retrieved, we optimize models with reinforcement learning using verifiable rewards based on downstream benchmark correctness. We further augment training with synthetic reasoning traces that provide causally plausible explanations for local counterfactual transitions. Using this pipeline, we train language models under 10B parameters, including Qwen3.5-9B, and evaluate them on clinical trial outcome prediction. We aim to show that DeepImagine consistently improves over untuned language models and traditional correlational baselines. Finally, we aim to show that the learned reasoning trajectories provide interpretable signals about how models represent trial-level mechanisms, suggesting a practical path toward more mechanistic and scientifically useful biomedical language models.

0 Citations

0 Influential

15 Altmetric

75.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!