2601.18132v1 Jan 26, 2026 cs.AI

RareAlert: 조기 희귀질환 위험 선별을 위한 이기종 거대 언어 모델 추론의 정렬

RareAlert: Aligning heterogeneous large language model reasoning for early rare disease risk screening

Xi Chen

Citations: 11

h-index: 1

Huahui Yi

Citations: 80

h-index: 5

Hanyu Zhou

Citations: 35

h-index: 3

M. You

Citations: 612

h-index: 8

Li Wang

Citations: 514

h-index: 5

W. Fu

Citations: 1,068

h-index: 21

Kang Li

Citations: 5

h-index: 1

Jian Li

Citations: 537

h-index: 6

Hong Zhou

Citations: 115

h-index: 3

Shiyu Feng

Citations: 1

h-index: 1

Kun Wang

Citations: 93

h-index: 4

T. He

Citations: 615

h-index: 14

Qiankun Li

Citations: 517

h-index: 9

오진 및 진단 지연은 희귀질환 치료에 있어 여전히 주요한 과제로 남아 있습니다. 초기 임상 진료 시, 의사들은 높은 불확실성 속에서 제한된 정보만을 사용하여 희귀질환 위험을 평가합니다. 이 단계에서 고위험 환자를 인지하지 못하면 표적 진단 검사가 시행되지 않는 경우가 많아 진단을 놓치게 됩니다. 기존의 1차 진료 분류 절차는 초기 임상 증상만으로 희귀질환 환자를 신뢰성 있게 식별하기에 구조적으로 불충분하며, 진단 지연을 줄이기 위해 보편적 선별 검사가 필요합니다. 이에 우리는 일상적으로 확보 가능한 1차 방문 정보를 바탕으로 환자 수준의 희귀질환 위험을 예측하는 조기 선별 시스템인 RareAlert를 제안합니다. RareAlert는 10개의 LLM이 생성한 추론을 통합하고, 기계 학습을 사용하여 이러한 신호들을 보정 및 가중치를 부여하며, 정렬된 추론을 로컬 배포가 가능한 단일 모델로 증류(distill)합니다. RareAlert를 개발하고 평가하기 위해 우리는 33개 Orphanet 질환 범주와 7,000개 이상의 희귀 질환을 포함하며 희귀 및 비희귀 증례를 모두 아우르는 158,666건의 실제 데이터셋인 RareBench를 구축했습니다. 연구 결과는 희귀질환 식별이 일반 환자 집단에 적용되는 보편적 불확실성 해소 과정으로 재개념화될 수 있음을 보여주었습니다. 독립적인 테스트 세트에서 보정된 추론 신호로 훈련된 Qwen3-4B 기반 모델인 RareAlert는 0.917의 AUC를 달성하여, 최고의 기계 학습 앙상블 모델뿐만 아니라 GPT-5, DeepSeek-R1, Claude-3.7-Sonnet, o3-mini, Gemini-2.5-Pro, Qwen3-235B를 포함한 평가된 모든 LLM의 성능을 능가했습니다. 이러한 결과는 LLM 의학적 추론의 다양성과 불확실성이 높은 임상 과제에서 그러한 추론을 정렬하는 것의 효과를 입증합니다. 보정된 추론을 단일 모델에 통합함으로써, RareAlert는 대규모 로컬 배포에 적합하고 정확하며 개인정보를 보호하는 확장 가능한 희귀질환 위험 선별을 가능하게 합니다.

Original Abstract

Missed and delayed diagnosis remains a major challenge in rare disease care. At the initial clinical encounters, physicians assess rare disease risk using only limited information under high uncertainty. When high-risk patients are not recognised at this stage, targeted diagnostic testing is often not initiated, resulting in missed diagnosis. Existing primary care triage processes are structurally insufficient to reliably identify patients with rare diseases at initial clinical presentation and universal screening is needed to reduce diagnostic delay. Here we present RareAlert, an early screening system which predict patient-level rare disease risk from routinely available primary-visit information. RareAlert integrates reasoning generated by ten LLMs, calibrates and weights these signals using machine learning, and distils the aligned reasoning into a single locally deployable model. To develop and evaluate RareAlert, we curated RareBench, a real-world dataset of 158,666 cases covering 33 Orphanet disease categories and more than 7,000 rare conditions, including both rare and non-rare presentations. The results showed that rare disease identification can be reconceptualised as a universal uncertainty resolution process applied to the general patient population. On an independent test set, RareAlert, a Qwen3-4B based model trained with calibrated reasoning signals, achieved an AUC of 0.917, outperforming the best machine learning ensemble and all evaluated LLMs, including GPT-5, DeepSeek-R1, Claude-3.7-Sonnet, o3-mini, Gemini-2.5-Pro, and Qwen3-235B. These findings demonstrate the diversity in LLM medical reasoning and the effectiveness of aligning such reasoning in highly uncertain clinical tasks. By incorporating calibrated reasoning into a single model, RareAlert enables accurate, privacy-preserving, and scalable rare disease risk screening suitable for large-scale local deployment.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!