2604.00514v1 Apr 01, 2026 cs.CV

MAESIL: 마스크 기반 자동 인코더를 이용한 향상된 자기 지도 의료 영상 학습

MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning

Kyeonghun Kim

Citations: 14

h-index: 1

K. Liao

Citations: 4

h-index: 1

Hyuk-Jae Lee

Citations: 25

h-index: 3

Y. Han

Citations: 7

h-index: 2

Seoyoung Ju

Citations: 73

h-index: 1

Yeonju Jean

Citations: 1

h-index: 1

N. Kim

Citations: 93

h-index: 5

Hye-Won Jung

Citations: 1

h-index: 1

Hyunsu Go

Citations: 1

h-index: 1

Seongbin Park

Citations: 10

h-index: 2

Soo Yong Kim

Citations: 12

h-index: 3

Junsu Lim

Citations: 3

h-index: 1

E. Choi

Citations: 521

h-index: 13

Seohyoung Park

Citations: 1

h-index: 1

Gyeongmin Kim

Citations: 11

h-index: 2

Min-Jin Kwon

Citations: 4

h-index: 1

Kyungseok Yuh

Citations: 2

h-index: 1

컴퓨터 단층 촬영(CT)과 같은 3차원(3D) 의료 영상에 대한 딥러닝 모델 학습은 레이블이 있는 데이터의 부족이라는 근본적인 어려움을 안고 있습니다. 자연 영상에 대한 사전 학습이 일반적이지만, 이는 상당한 도메인 격차를 초래하여 성능을 제한합니다. 레이블이 없는 의료 데이터에 대한 자기 지도 학습(SSL)은 강력한 해결책으로 부상했지만, 기존의 주요 프레임워크는 종종 CT 스캔의 고유한 3D 특성을 활용하지 못합니다. 이러한 방법은 일반적으로 3D 스캔을 독립적인 2D 슬라이스의 집합으로 처리하며, 이는 중요한 축 방향의 일관성과 3D 구조적 맥락을 근본적으로 무시하는 접근 방식입니다. 이러한 한계를 해결하기 위해, 우리는 향상된 자기 지도 의료 영상 학습을 위한 자동 인코더(MAESIL)라는 새로운 자기 지도 학습 프레임워크를 제안합니다. MAESIL의 핵심 혁신은 '슈퍼 패치(superpatch)'로, 3D 덩어리 기반의 입력 단위로서 3D 맥락 보존과 계산 효율성의 균형을 맞춥니다. 우리의 프레임워크는 볼륨을 슈퍼 패치로 분할하고, 3D 마스크 기반 자동 인코더 전략과 이중 마스크 전략을 사용하여 포괄적인 공간 표현을 학습합니다. 우리는 세 개의 다양한 대규모 공개 CT 데이터 세트를 사용하여 우리의 접근 방식을 검증했습니다. 실험 결과는 MAESIL이 PSNR 및 SSIM과 같은 주요 복원 지표에서 AE, VAE 및 VQ-VAE와 같은 기존 방법보다 상당한 성능 향상을 보인다는 것을 보여줍니다. 이는 MAESIL을 3D 의료 영상 작업에 대한 강력하고 실용적인 사전 학습 솔루션으로 자리매김합니다.

Original Abstract

Training deep learning models for three-dimensional (3D) medical imaging, such as Computed Tomography (CT), is fundamentally challenged by the scarcity of labeled data. While pre-training on natural images is common, it results in a significant domain shift, limiting performance. Self-Supervised Learning (SSL) on unlabeled medical data has emerged as a powerful solution, but prominent frameworks often fail to exploit the inherent 3D nature of CT scans. These methods typically process 3D scans as a collection of independent 2D slices, an approach that fundamentally discards critical axial coherence and the 3D structural context. To address this limitation, we propose the autoencoder for enhanced self-supervised medical image learning(MAESIL), a novel self-supervised learning framework designed to capture 3D structural information efficiently. The core innovation is the 'superpatch', a 3D chunk-based input unit that balances 3D context preservation with computational efficiency. Our framework partitions the volume into superpatches and employs a 3D masked autoencoder strategy with a dual-masking strategy to learn comprehensive spatial representations. We validated our approach on three diverse large-scale public CT datasets. Our experimental results show that MAESIL demonstrates significant improvements over existing methods such as AE, VAE and VQ-VAE in key reconstruction metrics such as PSNR and SSIM. This establishes MAESIL as a robust and practical pre-training solution for 3D medical imaging tasks.

1 Citations

0 Influential

6.5 Altmetric

33.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!