2601.10880v1 Jan 15, 2026 cs.CV

Medical SAM3: 범용 프롬프트 기반 의료 영상 분할을 위한 기초 모델

Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation

Yihua Shao

Citations: 8

h-index: 1

Chongcong Jiang

Citations: 19

h-index: 2

Tianxingjian Ding

Citations: 13

h-index: 2

Chuhan Song

Citations: 9

h-index: 1

Jiachen Tu

Citations: 267

h-index: 13

Ziyang Yan

Citations: 178

h-index: 8

Zhenyi Wang

Citations: 51

h-index: 5

Yuzhang Shang

Citations: 26

h-index: 3

Tianyu Han

Citations: 8

h-index: 1

Yu Tian

Citations: 1,589

h-index: 15

SAM3과 같은 프롬프트 기반 분할 기초 모델은 대화형 및 개념 기반 프롬프팅을 통해 강력한 일반화 능력을 보여주었습니다. 그러나 이러한 모델의 의료 영상 분할에 대한 직접적인 적용은 심각한 도메인 변화, 특권적인 공간 정보의 부재, 그리고 복잡한 해부학적 및 체적 구조에 대한 추론 필요성으로 인해 제한됩니다. 본 연구에서는 대규모의 이질적인 2D 및 3D 의료 영상 데이터셋과 이에 상응하는 분할 마스크 및 텍스트 프롬프트를 사용하여 SAM3을 완전히 미세 조정함으로써 개발된, 범용 프롬프트 기반 의료 영상 분할을 위한 기초 모델인 Medical SAM3을 제시합니다. 원본 SAM3에 대한 체계적인 분석을 통해, 의료 데이터에 대한 성능이 현저히 저하되며, 그 성능은 주로 실제 데이터에서 파생된 경계 상자와 같은 강력한 기하학적 사전 정보에 크게 의존한다는 것을 확인했습니다. 이러한 결과는 프롬프트 엔지니어링뿐만 아니라 전체 모델 적응의 필요성을 시사합니다. Medical SAM3은 SAM3의 모델 파라미터를 10가지 의료 영상 모달리티에 걸쳐 33개의 데이터셋에 대해 미세 조정하여, 프롬프트 기반의 유연성을 유지하면서도 강력한 도메인 특이적인 표현을 학습합니다. 장기, 영상 모달리티, 차원을 포괄하는 광범위한 실험을 통해, 특히 의미적 모호성, 복잡한 형태, 그리고 장거리 3D 맥락이 존재하는 어려운 시나리오에서 일관되고 상당한 성능 향상을 보여줍니다. 본 연구의 결과는 Medical SAM3을 의료 영상 분야의 범용 텍스트 기반 분할 기초 모델로 확립하며, 심각한 도메인 변화 하에서 강력한 프롬프트 기반 분할을 달성하기 위한 전체 모델 적응의 중요성을 강조합니다. 코드 및 모델은 https://github.com/AIM-Research-Lab/Medical-SAM3 에서 제공됩니다.

Original Abstract

Promptable segmentation foundation models such as SAM3 have demonstrated strong generalization capabilities through interactive and concept-based prompting. However, their direct applicability to medical image segmentation remains limited by severe domain shifts, the absence of privileged spatial prompts, and the need to reason over complex anatomical and volumetric structures. Here we present Medical SAM3, a foundation model for universal prompt-driven medical image segmentation, obtained by fully fine-tuning SAM3 on large-scale, heterogeneous 2D and 3D medical imaging datasets with paired segmentation masks and text prompts. Through a systematic analysis of vanilla SAM3, we observe that its performance degrades substantially on medical data, with its apparent competitiveness largely relying on strong geometric priors such as ground-truth-derived bounding boxes. These findings motivate full model adaptation beyond prompt engineering alone. By fine-tuning SAM3's model parameters on 33 datasets spanning 10 medical imaging modalities, Medical SAM3 acquires robust domain-specific representations while preserving prompt-driven flexibility. Extensive experiments across organs, imaging modalities, and dimensionalities demonstrate consistent and significant performance gains, particularly in challenging scenarios characterized by semantic ambiguity, complex morphology, and long-range 3D context. Our results establish Medical SAM3 as a universal, text-guided segmentation foundation model for medical imaging and highlight the importance of holistic model adaptation for achieving robust prompt-driven segmentation under severe domain shift. Code and model will be made available at https://github.com/AIM-Research-Lab/Medical-SAM3.

8 Citations

1 Influential

53.352419975191 Altmetric

276.8 Score

Original PDF

175

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!