2605.03352v1 May 05, 2026 cs.CV

다중 모드 대규모 언어 모델(MLLM)은 병리적 움직임을 이해할 수 있는가? 발작 세미오학에 대한 예비 연구

Can Multimodal Large Language Models Understand Pathologic Movements? A Pilot Study on Seizure Semiology

V. Roychowdhury

Citations: 80

h-index: 6

Lina Zhang

Citations: 2

h-index: 1

T. Monsoor

Citations: 148

h-index: 6

Mehmet E. Lorasdagi

Citations: 9

h-index: 2

Prateik Sinha

Citations: 0

h-index: 0

Peizheng Li

Citations: 25

h-index: 3

J. Pasqua

Citations: 2

h-index: 1

Colin M McCrimmon

Citations: 13

h-index: 2

Rajarshi Mazumder

Citations: 20

h-index: 2

Chong Han

Citations: 5

h-index: 1

Yuanzhi Wang

Citations: 302

h-index: 6

다중 모드 대규모 언어 모델(MLLM)은 일상적인 인간 활동을 인식하는 데 강력한 기능을 보여주었지만, 신경학적 장애에서 발생하는 임상적으로 중요한 비자발적 움직임을 분석하는 데 있어 그 잠재력은 아직 충분히 탐구되지 않았습니다. 본 예비 연구에서는 MLLM이 발작 영상을 통해 병리적 움직임을 자동으로 인식하는 능력을 평가합니다. 최첨단 MLLM의 제로샷 성능을 90건의 임상 발작 녹음 영상에 대해 ILAE에서 정의한 20가지 세미오학적 특징에 대해 평가했습니다. MLLM은 특정 작업에 대한 훈련 없이 18가지 특징 중 13가지에서 기존의 CNN 및 ViT 기반 모델보다 우수한 성능을 보였으며, 특히 두드러진 자세 및 맥락적 특징을 인식하는 데 강점을 보였지만, 미세하고 고주파수의 움직임을 인식하는 데 어려움을 겪었습니다. 특징별 신호 증강(얼굴 잘라내기, 자세 추정, 오디오 노이즈 제거)은 20가지 특징 중 10가지의 성능을 향상시켰습니다. 전문가 평가 결과, 올바르게 예측된 사례에 대해 MLLM이 생성한 설명 중 94.3%가 최소 60%의 신뢰도 점수를 달성하여, 신경과 전문의의 추론과 일치했습니다. 이러한 결과는 일반적인 MLLM을 특정 전처리 전략을 통해 전문적인 임상 영상 분석에 적용할 수 있는 가능성을 보여주며, 해석 가능하고 효율적인 진단 지원을 위한 방법을 제시합니다. 저희의 코드는 https://github.com/LinaZhangUCLA/PathMotionMLLM 에서 공개적으로 이용하실 수 있습니다.

Original Abstract

Multimodal Large Language Models (MLLMs) have demonstrated robust capabilities in recognizing everyday human activities, yet their potential for analyzing clinically significant involuntary movements in neurological disorders remains largely unexplored. This pilot study evaluates the capability of MLLMs for automated recognition of pathological movements in seizure videos. We assessed the zero-shot performance of state-of-the-art MLLMs on 20 ILAE-defined semiological features across 90 clinical seizure recordings. MLLMs outperformed fine-tuned Convolutional Neural Network (CNN) and Vision Transformer (ViT) baseline models on 13 of 18 features without task-specific training, demonstrating particular strength in recognizing salient postural and contextual features while struggling with subtle, high-frequency movements. Feature-targeted signal enhancement (facial cropping, pose estimation, audio denoising) improved performance on 10 of 20 features. Expert evaluation showed that 94.3 percent of MLLM-generated explanations for correctly predicted cases achieved at least 60 percent faithfulness scores, aligning with epileptologist reasoning. These findings demonstrate the potential of adapting general-purpose MLLMs for specialized clinical video analysis through targeted preprocessing strategies, offering a path toward interpretable, efficient diagnostic assistance. Our code is publicly available at https://github.com/LinaZhangUCLA/PathMotionMLLM.

0 Citations

0 Influential

28.493061443341 Altmetric

142.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!