2604.01612v1 Apr 02, 2026 cs.CV

NEMESIS: 노이즈 제거를 통해 효율성을 높인 MAE 모델 및 향상된 슈퍼패치 통합 전략

NEMESIS: Noise-suppressed Efficient MAE with Enhanced Superpatch Integration Strategy

Youngung Han

Citations: 2

h-index: 1

Kyeonghun Kim

Citations: 14

h-index: 1

K. Liao

Citations: 4

h-index: 1

Hye-Won Jung

Citations: 1

h-index: 1

Hyunsu Go

Citations: 1

h-index: 1

Eun-Won Choi

Citations: 0

h-index: 0

Seongbin Park

Citations: 10

h-index: 2

J. Lim

Citations: 0

h-index: 0

Jiwon Yang

Citations: 1

h-index: 1

Sumin Lee

Citations: 10

h-index: 2

Insung Hwang

Citations: 1

h-index: 1

N. Kim

Citations: 143

h-index: 5

체적 CT 영상은 임상 진단에 필수적이지만, 3차원 데이터의 어노테이션은 비용이 많이 들고 시간이 오래 걸리기 때문에, 레이블이 없는 데이터로부터 자기 지도 학습(SSL)을 활용하는 것이 중요합니다. 그러나 전체 체적 트랜스포머의 높은 메모리 비용과 CT 데이터의 이방성 공간 구조로 인해 기존 마스킹 전략으로는 3차원 CT에 SSL을 적용하기 어렵습니다. 본 논문에서는 로컬 128x128x128 슈퍼패치를 사용하여 메모리 효율적인 학습을 가능하게 하면서 해부학적 디테일을 유지하는 마스킹 오토인코더(MAE) 프레임워크인 NEMESIS를 제안합니다. NEMESIS는 다음과 같은 세 가지 핵심 구성 요소를 포함합니다. (i) 전방향 학습을 위한 노이즈 강화 재구성, (ii) 평면 방향 및 축 방향 토큰 제거를 통해 이중 마스킹을 수행하는 마스킹 해부학적 트랜스포머 블록(MATB), (iii) 다양한 스케일의 문맥 정보를 통합하는 NEMESIS 토큰(NT). BTCV 멀티 오거나 분류 벤치마크에서, NEMESIS는 동결된 백본과 선형 분류기를 사용하여 평균 AUROC 0.9633을 달성했으며, 이는 완전히 미세 조정된 SuPreM (0.9493) 및 VoCo (0.9387)를 능가하는 성능입니다. 레이블이 제한된 환경(전체 어노테이션의 10%만 사용)에서도 AUROC 0.9075를 유지하여 강력한 레이블 효율성을 보여줍니다. 또한, 슈퍼패치 기반 설계는 전체 체적 기준 모델에 비해 계산 비용을 31.0 GFLOPs/forward pass로 줄여, 3차원 의료 영상 분야에서 확장 가능하고 강력한 기반을 제공합니다.

Original Abstract

Volumetric CT imaging is essential for clinical diagnosis, yet annotating 3D volumes is expensive and time-consuming, motivating self-supervised learning (SSL) from unlabeled data. However, applying SSL to 3D CT remains challenging due to the high memory cost of full-volume transformers and the anisotropic spatial structure of CT data, which is not well captured by conventional masking strategies. We propose NEMESIS, a masked autoencoder (MAE) framework that operates on local 128x128x128 superpatches, enabling memory-efficient training while preserving anatomical detail. NEMESIS introduces three key components: (i) noise-enhanced reconstruction as a pretext task, (ii) Masked Anatomical Transformer Blocks (MATB) that perform dual-masking through parallel plane-wise and axis-wise token removal, and (iii) NEMESIS Tokens (NT) for cross-scale context aggregation. On the BTCV multi-organ classification benchmark, NEMESIS with a frozen backbone and a linear classifier achieves a mean AUROC of 0.9633, surpassing fully fine-tuned SuPreM (0.9493) and VoCo (0.9387). Under a low-label regime with only 10% of available annotations, it retains an AUROC of 0.9075, demonstrating strong label efficiency. Furthermore, the superpatch-based design reduces computational cost to 31.0 GFLOPs per forward pass, compared to 985.8 GFLOPs for the full-volume baseline, providing a scalable and robust foundation for 3D medical imaging.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!