2603.27341v1 Mar 28, 2026 cs.AI

수술 AI에 대한 비교 연구: 데이터셋, 기초 모델 및 의료 인공 일반 지능(Med-AGI) 달성의 장애 요인

A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI

K. Skobelev

Citations: 1

h-index: 1

Y. Baranovski

Citations: 0

h-index: 0

S. Otto

Citations: 0

h-index: 0

D. Donoho

Citations: 51

h-index: 3

X. Y. Han

Citations: 231

h-index: 3

M. Masson-Forsythe

Citations: 24

h-index: 1

Eric Fithian

Citations: 1

h-index: 1

Jack Cook

Citations: 2

h-index: 1

S. Angara

Citations: 149

h-index: 6

Zhuang Yi

Citations: 79

h-index: 3

John Zhu

Citations: 148

h-index: 4

N. Mainkar

Citations: 18

h-index: 3

최근의 인공지능(AI) 모델은 생의학 분야의 여러 성능 지표에서 인간 전문가 수준에 도달하거나 그 이상을 달성했지만, 수술 영상 분석 분야에서는 아직 뒤쳐지는 상황입니다. 수술은 다중 모드 데이터 통합, 인간과의 상호작용, 물리적 효과 등 다양한 작업을 통합해야 하므로, 성능이 향상된다면 일반적으로 활용 가능한 AI 모델은 협업 도구로서 매우 유용할 수 있습니다. 한편, 모델 아키텍처 크기를 확장하고 더 많은 데이터를 활용하는 것은 매력적인 접근 방식이며, 특히 매년 수백만 시간 분량의 수술 영상 데이터가 생성되고 있기 때문입니다. 반면에, AI 학습을 위한 수술 데이터 준비에는 상당한 수준의 전문 지식이 필요하며, 해당 데이터를 활용한 학습에는 고가의 컴퓨팅 자원이 필요합니다. 이러한 상충되는 요소들은 현대 AI가 수술 분야에 얼마나 기여할 수 있을지에 대한 불확실성을 야기합니다. 본 논문에서는 2026년 현재 사용 가능한 최첨단 AI 방법을 활용하여 수술 도구 감지를 사례 연구로 진행하며 이러한 질문을 탐구합니다. 실험 결과, 수십억 개의 파라미터를 가진 모델과 광범위한 학습에도 불구하고, 현재의 시각-언어 모델은 뇌수술에서의 도구 감지라는 비교적 간단한 작업에서도 한계를 보이는 것을 확인했습니다. 또한, 모델 크기와 학습 시간을 늘려도 관련 성능 지표의 개선 효과는 점차 감소하는 것을 보여주는 실험 결과를 제시합니다. 따라서, 우리의 실험 결과는 현재 모델이 수술 분야에서 여전히 상당한 장애물을 안고 있을 가능성을 시사합니다. 더욱이, 일부 장애물은 추가적인 컴퓨팅 자원을 통해 극복할 수 없으며 다양한 모델 아키텍처에 걸쳐 지속적으로 나타나므로, 데이터 및 레이블의 가용성이 유일한 제한 요인인지에 대한 의문을 제기합니다. 본 논문에서는 이러한 제약 요인의 주요 원인을 분석하고 잠재적인 해결 방안을 제시합니다.

Original Abstract

Recent Artificial Intelligence (AI) models have matched or exceeded human experts in several benchmarks of biomedical task performance, but have lagged behind on surgical image-analysis benchmarks. Since surgery requires integrating disparate tasks -- including multimodal data integration, human interaction, and physical effects -- generally-capable AI models could be particularly attractive as a collaborative tool if performance could be improved. On the one hand, the canonical approach of scaling architecture size and training data is attractive, especially since there are millions of hours of surgical video data generated per year. On the other hand, preparing surgical data for AI training requires significantly higher levels of professional expertise, and training on that data requires expensive computational resources. These trade-offs paint an uncertain picture of whether and to-what-extent modern AI could aid surgical practice. In this paper, we explore this question through a case study of surgical tool detection using state-of-the-art AI methods available in 2026. We demonstrate that even with multi-billion parameter models and extensive training, current Vision Language Models fall short in the seemingly simple task of tool detection in neurosurgery. Additionally, we show scaling experiments indicating that increasing model size and training time only leads to diminishing improvements in relevant performance metrics. Thus, our experiments suggest that current models could still face significant obstacles in surgical use cases. Moreover, some obstacles cannot be simply ``scaled away'' with additional compute and persist across diverse model architectures, raising the question of whether data and label availability are the only limiting factors. We discuss the main contributors to these constraints and advance potential solutions.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!