2604.05190v1 Apr 06, 2026 cs.CL

임상 기록 및 대규모 언어 모델을 활용한 임상 시험 참여자 모집 개선

Improving Clinical Trial Recruitment using Clinical Narratives and Large Language Models

Mengxian Lyu

Citations: 25

h-index: 3

Cheng Peng

Citations: 796

h-index: 8

Ziyi Chen

University of Florida

Citations: 47

h-index: 3

Yonghui Wu

Citations: 5

h-index: 1

환자를 임상 시험에 등록하기 위한 선별 과정은 잘 알려진, 많은 노동력을 필요로 하는 병목 현상이며, 이는 등록 부족과 결국 임상 시험 실패로 이어집니다. 최근 대규모 언어 모델(LLM)의 발전은 인공지능을 활용하여 선별 과정을 개선할 수 있는 유망한 기회를 제공합니다. 본 연구에서는 임상 시험 참여자 모집을 촉진하기 위해, 인코더 기반 및 디코더 기반 생성형 LLM을 활용하여 임상 기록을 선별하는 방법을 체계적으로 연구했습니다. 일반적인 LLM과 의료 분야에 특화된 LLM을 모두 검토했으며, 긴 문서를 처리할 때 발생하는 "중간 정보 손실" 문제를 완화하기 위한 세 가지 전략을 탐색했습니다. 이는 1) 원본 긴 컨텍스트 사용: LLM의 기본 컨텍스트 창을 사용, 2) 개체명 인식 기반 추출 요약: 개체명 인식을 사용하여 긴 문서를 요약본으로 변환, 3) RAG (Retrieval-Augmented Generation): 자격 요건 기준에 기반한 동적 증거 검색입니다. 2018년 N2C2 Track 1 벤치마크 데이터 세트를 사용하여 평가를 진행했습니다. 실험 결과, RAG 전략을 적용한 MedGemma 모델이 89.05%의 최고 마이크로-F1 점수를 달성하여 다른 모델보다 우수한 성능을 보였습니다. 생성형 LLM은 장기간에 걸쳐 긴 문서 전체를 이해해야 하는 임상 기준에서 괄목할 만한 성능 향상을 보인 반면, 짧은 컨텍스트 범위(예: 검사 결과)의 임상 기준에서는 점진적인 개선을 보였습니다. 임상 시험 모집에 LLM을 실제 적용할 때에는 규칙 기반 쿼리, 인코더 기반 LLM, 생성형 LLM 중에서 효율성을 극대화하고 합리적인 컴퓨팅 비용을 유지할 수 있는 특정 기준을 고려해야 합니다.

Original Abstract

Screening patients for enrollment is a well-known, labor-intensive bottleneck that leads to under-enrollment and, ultimately, trial failures. Recent breakthroughs in large language models (LLMs) offer a promising opportunity to use artificial intelligence to improve screening. This study systematically explored both encoder- and decoder-based generative LLMs for screening clinical narratives to facilitate clinical trial recruitment. We examined both general-purpose LLMs and medical-adapted LLMs and explored three strategies to alleviate the "Lost in the Middle" issue when handling long documents, including 1) Original long-context: using the default context windows of LLMs, 2) NER-based extractive summarization: converting the long document into summarizations using named entity recognition, 3) RAG: dynamic evidence retrieval based on eligibility criteria. The 2018 N2C2 Track 1 benchmark dataset is used for evaluation. Our experimental results show that the MedGemma model with the RAG strategy achieved the best micro-F1 score of 89.05%, outperforming other models. Generative LLMs have remarkably improved trial criteria that require long-term reasoning across long documents, whereas trial criteria that span a short piece of context (e.g., lab tests) show incremental improvements. The real-world adoption of LLMs for trial recruitment must consider specific criteria for selecting among rule-based queries, encoder-based LLMs, and generative LLMs to maximize efficiency within reasonable computing costs.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!