2603.09018v1 Mar 09, 2026 cs.AI

Meissa: 다중 모드 의료 에이전트 지능

Meissa: Multi-modal Medical Agentic Intelligence

Alan L. Yuille

Citations: 563

h-index: 13

Yixiong Chen

Citations: 169

h-index: 6

Xinyi Bai

Citations: 25

h-index: 2

Yue Pan

Citations: 33

h-index: 3

Zongwei Zhou

Citations: 764

h-index: 16

다중 모드 대규모 언어 모델(MM-LLM)은 의료 영상 이해 및 임상 추론 분야에서 뛰어난 성능을 보여왔습니다. 최근 의료 에이전트 시스템은 도구 사용 및 다중 에이전트 협업을 통해 이러한 모델을 확장하여 복잡한 의사 결정을 가능하게 합니다. 그러나 이러한 시스템은 대부분 최첨단 모델(예: GPT)에 의존하며, API 기반 배포는 높은 비용, 높은 지연 시간 및 개인 정보 보호 위험을 초래하여 현장 임상 요구 사항과 충돌합니다. 본 논문에서는 에이전트 기능을 오프라인으로 구현하는 경량의 40억 파라미터 의료 MM-LLM인 Meissa를 소개합니다. Meissa는 정적인 답변을 모방하는 대신, 최첨단 모델로부터 구조화된 경로를 추출하여 외부 상호 작용을 수행할 시점(전략 선택)과 다단계 상호 작용을 수행하는 방법(전략 실행)을 학습합니다. 구체적으로, 다음과 같은 방법을 제안합니다. (1) 통합 경로 모델링: 경로(추론 및 동작 기록)는 단일 상태-행동-관측 형식 내에 표현되어, 하나의 모델이 다양한 의료 환경에서 일반화할 수 있도록 합니다. (2) 3단계 계층적 지도 학습: 모델 자체의 오류는 직접 추론에서 도구 활용 및 다중 에이전트 상호 작용으로 점진적으로 확대되는 과정을 유발하여, 난이도에 대한 전략 선택을 명시적으로 학습합니다. (3) 예측-사후 지도 학습: 탐색적 전방 경로와 후회 기반으로 합리화된 실행 경로를 결합하여 효과적인 상호 작용 정책을 안정적으로 학습합니다. 40,000개의 큐레이션된 경로로 학습된 Meissa는 13개의 의료 벤치마크(방사선학, 병리학 및 임상 추론 포함)의 16개 평가 환경에서 10개 환경에서 독점적인 최첨단 에이전트의 성능과 동등하거나 그 이상의 성능을 보입니다. 일반적인 최첨단 모델인 Gemini-3보다 25배 적은 파라미터를 사용하면서, Meissa는 API 기반 배포에 비해 22배 낮은 최종-종단 지연 시간으로 완전히 오프라인으로 작동합니다. 데이터, 모델 및 환경은 https://github.com/Schuture/Meissa에서 공개됩니다.

Original Abstract

Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely almost entirely on frontier models (e.g., GPT), whose API-based deployment incurs high cost, high latency, and privacy risks that conflict with on-premise clinical requirements. We present Meissa, a lightweight 4B-parameter medical MM-LLM that brings agentic capability offline. Instead of imitating static answers, Meissa learns both when to engage external interaction (strategy selection) and how to execute multi-step interaction (strategy execution) by distilling structured trajectories from frontier models. Specifically, we propose: (1) Unified trajectory modeling: trajectories (reasoning and action traces) are represented within a single state-action-observation formalism, allowing one model to generalize across heterogeneous medical environments. (2) Three-tier stratified supervision: the model's own errors trigger progressive escalation from direct reasoning to tool-augmented and multi-agent interaction, explicitly learning difficulty-aware strategy selection. (3) Prospective-retrospective supervision: pairing exploratory forward traces with hindsight-rationalized execution traces enables stable learning of effective interaction policies. Trained on 40K curated trajectories, Meissa matches or exceeds proprietary frontier agents in 10 of 16 evaluation settings across 13 medical benchmarks spanning radiology, pathology, and clinical reasoning. Using over 25x fewer parameters than typical frontier models like Gemini-3, Meissa operates fully offline with 22x lower end-to-end latency compared to API-based deployment. Data, models, and environments are released at https://github.com/Schuture/Meissa.

2 Citations

0 Influential

46.317808230648 Altmetric

233.6 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!