2603.13673v1 Mar 14, 2026 cs.AI

LLM-MINE: 대규모 언어 모델 기반 알츠하이머병 및 관련 치매 증상 추출 방법론 (임상 노트 활용)

LLM-MINE: Large Language Model based Alzheimer's Disease and Related Dementias Phenotypes Mining from Clinical Notes

Yuzhang Xie

Citations: 85

h-index: 5

Mingchen Shao

Citations: 92

h-index: 5

Carl Yang

Citations: 30

h-index: 3

Jiaying Lu

Citations: 193

h-index: 7

알츠하이머병 및 관련 치매(ADRD) 증상을 전자 건강 기록(EHR)에서 정확하게 추출하는 것은 초기 단계 진단 및 질병 단계 분류에 매우 중요합니다. 그러나 이러한 정보는 일반적으로 표 형식 데이터가 아닌 비정형 텍스트 데이터에 포함되어 있어 정확한 추출이 어렵습니다. 이에 우리는 임상 노트에서 ADRD 증상을 자동으로 추출하기 위한 대규모 언어 모델 기반의 증상 추출 프레임워크인 LLM-MINE을 제안합니다. 두 가지 전문가가 정의한 증상 목록을 사용하여, 추출된 증상의 통계적 유의성을 코호트 간 비교를 통해 평가하고, 비지도 질병 단계 분류에 대한 유용성을 검증했습니다. 카이제곱 분석 결과, 코호트 간 통계적으로 유의미한 증상 차이가 확인되었으며, 특히 기억 장애가 가장 강력한 구분 요인임을 알 수 있었습니다. 결합된 증상 목록을 사용한 Few-shot 프롬프팅은 가장 우수한 클러스터링 성능(ARI=0.290, NMI=0.232)을 달성했으며, 이는 생물 의학 NER 및 사전 기반의 기존 방법보다 훨씬 뛰어난 성능입니다. 우리의 결과는 대규모 언어 모델 기반의 증상 추출이 비정형 노트에서 임상적으로 의미 있는 ADRD 정보를 발견하는 데 유망한 도구임을 보여줍니다.

Original Abstract

Accurate extraction of Alzheimer's Disease and Related Dementias (ADRD) phenotypes from electronic health records (EHR) is critical for early-stage detection and disease staging. However, this information is usually embedded in unstructured textual data rather than tabular data, making it difficult to be extracted accurately. We therefore propose LLM-MINE, a Large Language Model-based phenotype mining framework for automatic extraction of ADRD phenotypes from clinical notes. Using two expert-defined phenotype lists, we evaluate the extracted phenotypes by examining their statistical significance across cohorts and their utility for unsupervised disease staging. Chi-square analyses confirm statistically significant phenotype differences across cohorts, with memory impairment being the strongest discriminator. Few-shot prompting with the combined phenotype lists achieves the best clustering performance (ARI=0.290, NMI=0.232), substantially outperforming biomedical NER and dictionary-based baselines. Our results demonstrate that LLM-based phenotype extraction is a promising tool for discovering clinically meaningful ADRD signals from unstructured notes.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!