2604.05171v1 Apr 06, 2026 cs.CV

다중 모달 뇌 MRI를 위한 모달성 인식 및 해부학적 벡터 양자화 자동 인코딩

Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI

Mingjie Li

Citations: 7

h-index: 1

E. Adeli

Citations: 1,664

h-index: 23

K. Pohl

Citations: 6,641

h-index: 39

Edward Kim

Citations: 3

h-index: 1

Yue Zhao

Citations: 8

h-index: 2

강력한 변분 오토인코더(VAE)를 학습하는 것은 의료 영상 분석, 특히 MRI 합성 등 다양한 딥러닝 응용 분야에서 중요한 단계입니다. 기존 뇌 VAE는 주로 단일 모달 데이터(예: T1 가중 MRI)에 집중하는 경향이 있으며, T2 가중 MRI와 같은 다른 모달의 상호 보완적인 진단 가치를 간과합니다. 본 연구에서는 다중 모달 뇌 MRI를 재구성하기 위한 모달성 인식 및 해부학적 기반의 3차원 벡터 양자화 VAE(VQ-VAE)를 제안합니다. NeuroQuant는 먼저, 멀티-액시스 어텐션을 사용하여 모달 간의 공유된 잠재 표현을 학습하며, 이를 통해 멀리 떨어진 뇌 영역 간의 관계를 파악할 수 있습니다. 그런 다음, 모달에 독립적인 해부학적 구조와 모달에 의존적인 외관을 명시적으로 분리하는 이중 스트림 3차원 인코더를 사용합니다. 이후, 해부학적 인코딩은 공유된 코드북을 사용하여 이산화되고, 디코딩 단계에서 Feature-wise Linear Modulation (FiLM)을 통해 모달별 외관 특징과 결합됩니다. 이러한 전체적인 접근 방식은 3차원 MRI 데이터의 슬라이스 기반 획득을 고려하여 공동 2차원/3차원 전략을 사용하여 학습됩니다. 두 가지 다중 모달 뇌 MRI 데이터 세트에 대한 광범위한 실험 결과, NeuroQuant는 기존 VAE에 비해 우수한 재구성 정확도를 달성하며, 다운스트림 생성 모델링 및 모달 간 뇌 영상 분석을 위한 확장 가능한 기반을 제공합니다.

Original Abstract

Learning a robust Variational Autoencoder (VAE) is a fundamental step for many deep learning applications in medical image analysis, such as MRI synthesizes. Existing brain VAEs predominantly focus on single-modality data (i.e., T1-weighted MRI), overlooking the complementary diagnostic value of other modalities like T2-weighted MRIs. Here, we propose a modality-aware and anatomically grounded 3D vector-quantized VAE (VQ-VAE) for reconstructing multi-modal brain MRIs. Called NeuroQuant, it first learns a shared latent representation across modalities using factorized multi-axis attention, which can capture relationships between distant brain regions. It then employs a dual-stream 3D encoder that explicitly separates the encoding of modality-invariant anatomical structures from modality-dependent appearance. Next, the anatomical encoding is discretized using a shared codebook and combined with modality-specific appearance features via Feature-wise Linear Modulation (FiLM) during the decoding phase. This entire approach is trained using a joint 2D/3D strategy in order to account for the slice-based acquisition of 3D MRI data. Extensive experiments on two multi-modal brain MRI datasets demonstrate that NeuroQuant achieves superior reconstruction fidelity compared to existing VAEs, enabling a scalable foundation for downstream generative modeling and cross-modal brain image analysis.

0 Citations

0 Influential

19.5 Altmetric

97.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!