2604.15711v1 Apr 17, 2026 cs.CV

SSMamba: 병리학 이미지 분류를 위한 자기 지도 학습 기반 하이브리드 상태 공간 모델

SSMamba: A Self-Supervised Hybrid State Space Model for Pathological Image Classification

Sicheng Chen

Citations: 35

h-index: 4

Enhui Chai

Citations: 59

h-index: 3

Xingyu Li

Citations: 32

h-index: 3

Tianxiang Cui

Citations: 11

h-index: 2

Tianyi Zhang

Citations: 358

h-index: 10

병리학 진단은 주로 이미지 분석에 의존하며, 관심 영역(ROI)은 진단 근거의 주요 기반이 되고, 전체 슬라이드 이미지(WSI) 수준의 작업은 주로 집계된 패턴을 파악합니다. 이러한 중요한 형태학적 특징을 추출하기 위해, 비전 트랜스포머(ViT) 기반의 ROI 수준 기초 모델(FM)과 대규모 자기 지도 학습(SSL) 방법이 널리 사용되었습니다. 그러나 이러한 모델들이 ROI 분석에 적용될 때 세 가지 주요 한계점이 존재합니다. (1) 고정된 크기로 사전 학습을 수행하기 때문에 다양한 임상 환경에 대한 적응이 어렵다는 교차 확대 영역 이동 문제입니다. (2) FM의 ViT 백본은 높은 계산 비용과 부정확한 지역 특징 표현으로 인해 지역-전역 관계 모델링이 부족합니다. (3) 기존의 자기 주의 메커니즘은 미묘한 진단 단서를 간과하는 경향이 있어, 세밀한 감도가 부족합니다. 이러한 문제점을 해결하기 위해, 우리는 대규모 외부 데이터셋에 의존하지 않고 효과적인 세밀한 특징 학습을 가능하게 하는 하이브리드 SSL 프레임워크인 SSMamba를 제안합니다. 이 프레임워크는 세 가지 도메인 적응 구성 요소, 즉 영역 이동 문제를 완화하는 Mamba Masked Image Modeling (MAMIM), 균형 잡힌 지역-전역 모델링을 위한 Directional Multi-scale (DMS) 모듈, 그리고 향상된 세밀한 감성을 위한 Local Perception Residual (LPR) 모듈을 포함합니다. SSMamba는 두 단계 파이프라인을 사용하며, 먼저 대상 ROI 데이터셋에 대한 SSL 사전 학습을 수행한 다음, 지도 학습(SFT)을 통해 미세 조정합니다. 그 결과, SSMamba는 10개의 공개 ROI 데이터셋에서 11개의 최첨단(SOTA) 병리학 FM보다 성능이 뛰어나고, 6개의 공개 WSI 데이터셋에서 8개의 SOTA 방법보다 우수한 성능을 보였습니다. 이러한 결과는 병리학 이미지 분석을 위한 작업별 맞춤형 아키텍처 설계의 우수성을 입증합니다.

Original Abstract

Pathological diagnosis is highly reliant on image analysis, where Regions of Interest (ROIs) serve as the primary basis for diagnostic evidence, while whole-slide image (WSI)-level tasks primarily capture aggregated patterns. To extract these critical morphological features, ROI-level Foundation Models (FMs) based on Vision Transformers (ViTs) and large-scale self-supervised learning (SSL) have been widely adopted. However, three core limitations remain in their application to ROI analysis: (1) cross-magnification domain shift, as fixed-scale pretraining hinders adaptation to diverse clinical settings; (2) inadequate local-global relationship modeling, wherein the ViT backbone of FMs suffers from high computational overhead and imprecise local characterization; (3) insufficient fine-grained sensitivity, as traditional self-attention mechanisms tend to overlook subtle diagnostic cues. To address these challenges, we propose SSMamba, a hybrid SSL framework that enables effective fine-grained feature learning without relying on large external datasets. This framework incorporates three domain-adaptive components: Mamba Masked Image Modeling (MAMIM) for mitigating domain shift, a Directional Multi-scale (DMS) module for balanced local-global modeling, and a Local Perception Residual (LPR) module for enhanced fine-grained sensitivity. Employing a two-stage pipeline, SSL pretraining on target ROI datasets followed by supervised fine-tuning (SFT), SSMamba outperforms 11 state-of-the-art (SOTA) pathological FMs on 10 public ROI datasets and surpasses 8 SOTA methods on 6 public WSI datasets. These results validate the superiority of task-specific architectural designs for pathological image analysis.

0 Citations

0 Influential

5 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!