2601.07344v1 Jan 12, 2026 cs.CV

PulseMind: 실제 임상 진단을 위한 다중 모드 의료 모델

PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis

Jian Wang

Citations: 77

h-index: 4

Jiao Xu

Citations: 5

h-index: 1

Xin Chen

Citations: 5

h-index: 1

Lihe Zhang

Citations: 5

h-index: 1

Junwei Liu

Citations: 207

h-index: 3

Jiangwei Lao

Citations: 1,155

h-index: 8

Qipeng Zhu

Citations: 1

h-index: 1

Yunpeng Zhao

Citations: 3

h-index: 1

Congyun Jin

Citations: 21

h-index: 2

Shinan Liu

Citations: 189

h-index: 7

Zhihong Lu

Citations: 1

h-index: 1

Ping Wang

Citations: 171

h-index: 2

최근 의료 다중 모드 모델의 발전은 피부과, 병리학 또는 방사선학과와 같은 특수한 이미지 분석에 집중되어 왔습니다. 그러나 이러한 모델들은 실제 임상 진단의 복잡성을 충분히 반영하지 못하며, 이는 다양한 입력 데이터와 환자-의사 상호작용 과정에서의 지속적인 상황 이해를 필요로 합니다. 이러한 격차를 해소하기 위해, 우리는 체계적으로 구성된 데이터셋, 포괄적인 평가 벤치마크, 그리고 맞춤형 학습 프레임워크를 통합한 새로운 다중 모드 진단 모델인 PulseMind를 소개합니다. 구체적으로, 우리는 98,000건의 실제 다중 회차 상담 기록과 601,500개의 의료 이미지를 포함하는 진단 데이터셋인 MediScope를 구축했습니다. 이 데이터셋은 10개 이상의 주요 임상 부서 및 200개 이상의 세부 전문 분야를 포괄합니다. 또한, 실제 임상 진단의 요구 사항을 보다 잘 반영하기 위해, 우리는 적극성, 정확성, 유용성, 언어 품질을 포함하는 4차원 평가 프로토콜을 갖춘 다중 회차 진단 상담 벤치마크인 PulseMind Benchmark를 개발했습니다. 마지막으로, 우리는 다중 모드 임상 진단에 특화된 학습 프레임워크를 설계했으며, 이 프레임워크의 핵심 구성 요소는 비교 기반 강화 정책 최적화(CRPO)입니다. CRPO는 절대 점수 보상 대신, 다차원 비교에서 얻은 상대적 선호도 신호를 사용하여 안정적이고 인간과 일관된 학습 지침을 제공합니다. 광범위한 실험 결과는 PulseMind가 진단 상담 벤치마크 및 공개 의료 벤치마크 모두에서 경쟁력 있는 성능을 달성함을 보여줍니다.

Original Abstract

Recent advances in medical multi-modal models focus on specialized image analysis like dermatology, pathology, or radiology. However, they do not fully capture the complexity of real-world clinical diagnostics, which involve heterogeneous inputs and require ongoing contextual understanding during patient-physician interactions. To bridge this gap, we introduce PulseMind, a new family of multi-modal diagnostic models that integrates a systematically curated dataset, a comprehensive evaluation benchmark, and a tailored training framework. Specifically, we first construct a diagnostic dataset, MediScope, which comprises 98,000 real-world multi-turn consultations and 601,500 medical images, spanning over 10 major clinical departments and more than 200 sub-specialties. Then, to better reflect the requirements of real-world clinical diagnosis, we develop the PulseMind Benchmark, a multi-turn diagnostic consultation benchmark with a four-dimensional evaluation protocol comprising proactiveness, accuracy, usefulness, and language quality. Finally, we design a training framework tailored for multi-modal clinical diagnostics, centered around a core component named Comparison-based Reinforcement Policy Optimization (CRPO). Compared to absolute score rewards, CRPO uses relative preference signals from multi-dimensional com-parisons to provide stable and human-aligned training guidance. Extensive experiments demonstrate that PulseMind achieves competitive performance on both the diagnostic consultation benchmark and public medical benchmarks.

1 Citations

0 Influential

4 Altmetric

21.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!