2602.03320v1 Feb 03, 2026 cs.CV

MedSAM-Agent: 다중 턴 에이전트 강화 학습을 활용한 인터랙티브 의료 영상 분할 성능 향상

MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning

Shengyuan Liu

Citations: 17

h-index: 2

Liuxin Bao

Citations: 135

h-index: 5

Qi Yang

Citations: 93

h-index: 5

Wanting Geng

Citations: 72

h-index: 2

Boyun Zheng

Citations: 20

h-index: 2

Chenxin Li

Citations: 1,004

h-index: 15

Wenting Chen

Citations: 267

h-index: 9

Houwen Peng

Citations: 798

h-index: 9

Yixuan Yuan

Citations: 303

h-index: 8

의료 영상 분할은 특정 작업에 특화된 모델에서 벗어나 일반화된 프레임워크로 발전하고 있습니다. 최근 연구에서는 멀티모달 대규모 언어 모델(MLLM)을 자율 에이전트로 활용하여 검증 가능한 보상을 이용한 강화 학습(RLVR)을 통해 Segment Anything Model(SAM)과 같은 전문 도구를 활용합니다. 그러나 이러한 접근 방식은 종종 단일 턴의 경직된 상호 작용 전략에 의존하며, 훈련 과정에서 프로세스 수준의 감독이 부족하여 인터랙티브 도구의 동적인 잠재력을 충분히 활용하지 못하고 불필요한 동작을 유발합니다. 이러한 격차를 해소하기 위해, 우리는 인터랙티브 분할을 다단계 자율 의사 결정 프로세스로 재구성하는 프레임워크인 MedSAM-Agent를 제안합니다. 먼저, 전문가가 선별한 경로 생성을 위한 하이브리드 프롬프팅 전략을 도입하여 모델이 인간과 유사한 의사 결정 휴리스틱과 적응적 개선 전략을 학습하도록 합니다. 또한, 다중 턴의 엔드투엔드 결과 검증과 임상적 정확성을 고려한 프로세스 보상 설계를 통합한 두 단계의 훈련 파이프라인을 개발하여 상호 작용의 효율성과 의사 결정의 효율성을 높입니다. 6가지 의료 모달리티와 21개의 데이터 세트를 대상으로 수행한 광범위한 실험 결과, MedSAM-Agent는 최첨단 성능을 달성하며, 자율적인 의료 추론과 강력하고 반복적인 최적화를 효과적으로 통합합니다. 코드는 다음 링크에서 확인할 수 있습니다: [https://github.com/CUHK-AIM-Group/MedSAM-Agent](https://github.com/CUHK-AIM-Group/MedSAM-Agent)

Original Abstract

Medical image segmentation is evolving from task-specific models toward generalizable frameworks. Recent research leverages Multi-modal Large Language Models (MLLMs) as autonomous agents, employing reinforcement learning with verifiable reward (RLVR) to orchestrate specialized tools like the Segment Anything Model (SAM). However, these approaches often rely on single-turn, rigid interaction strategies and lack process-level supervision during training, which hinders their ability to fully exploit the dynamic potential of interactive tools and leads to redundant actions. To bridge this gap, we propose MedSAM-Agent, a framework that reformulates interactive segmentation as a multi-step autonomous decision-making process. First, we introduce a hybrid prompting strategy for expert-curated trajectory generation, enabling the model to internalize human-like decision heuristics and adaptive refinement strategies. Furthermore, we develop a two-stage training pipeline that integrates multi-turn, end-to-end outcome verification with a clinical-fidelity process reward design to promote interaction parsimony and decision efficiency. Extensive experiments across 6 medical modalities and 21 datasets demonstrate that MedSAM-Agent achieves state-of-the-art performance, effectively unifying autonomous medical reasoning with robust, iterative optimization. Code is available \href{https://github.com/CUHK-AIM-Group/MedSAM-Agent}{here}.

0 Citations

0 Influential

45.554589563221 Altmetric

227.8 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!