2605.02714v1 May 04, 2026 cs.CV

OphMAE: 적응형 안과 진단을 위한 기초 모델을 활용한 입체 및 평면 영상 통합

OphMAE: Bridging Volumetric and Planar Imaging with a Foundation Model for Adaptive Ophthalmological Diagnosis

Yu Huang

Citations: 0

h-index: 0

Qingyu Chen

Citations: 2

h-index: 1

Renjie Liang

Citations: 7

h-index: 2

Jie Xu

Citations: 14

h-index: 3

Sunu Mathew

Citations: 10

h-index: 2

Amir R. Hajrasouliha

Citations: 119

h-index: 6

A. Saykin

Citations: 629

h-index: 5

Ruogu Fang

Citations: 50

h-index: 3

Jiang Bian

Citations: 74

h-index: 3

Tien-En Chang

Citations: 1

h-index: 1

Zhen Chen

Citations: 4

h-index: 1

Jinyu Ding

Citations: 12

h-index: 2

최근 기초 모델의 등장으로 의료 인공지능(AI) 분야에 새로운 시대가 열렸으며, 이를 통해 대규모의 비표시 데이터셋에서 일반화 가능한 표현을 추출할 수 있게 되었습니다. 하지만 현재의 안과 AI 시스템은 대부분 단일 모달리티 기반으로 작동하여, 여러 영상 정보를 종합적으로 활용하는 임상 환경과는 괴리가 있습니다. 또한, 고성능 AI 시스템의 활용은 자원이 부족한 환경에서 고급 3차원 영상 장비의 부재로 인해 어려움을 겪는 경우가 많습니다. 본 연구에서는 다중 영상 정보를 통합하는 기초 모델인 Ophthalmic multimodal Masked Autoencoder (OphMAE)를 제안합니다. OphMAE는 3차원 광간섭 단층 촬영(OCT)의 입체 정보를 2차원 OCT 영상의 평면 정보와 융합하도록 설계되었습니다. 새로운 모드 간 융합 아키텍처와 독창적인 적응형 추론 메커니즘을 통해 OphMAE는 32,765명의 환자로부터 얻은 183,875장의 짝지어진 OCT 이미지를 사용하여 사전 훈련되었습니다. 8,191명의 환자로부터 얻은 48,340장의 짝지어진 OCT 이미지를 포함하는 17개의 다양한 진단 작업에 대한 엄격한 평가에서, OphMAE는 최고 수준의 성능을 보였습니다. 특히, 나이 관련 황반 변성(AMD)의 경우 96.9%의 영역 적분(AUC) 값을, 당뇨성 황반 부종(DME)의 경우 97.2%의 AUC 값을 달성하여 기존의 단일 모달리티 및 다중 모달리티 기초 모델을 능가했습니다. 더욱 중요한 점은 OphMAE는 높은 수준의 진단 정확도를 유지하면서도, 단일 모달리티의 2차원 입력 데이터만으로도 93.7%의 AUC 값을 달성할 수 있으며, 500개의 레이블링된 데이터만으로도 95.7%의 AUC 값을 유지하는 뛰어난 데이터 효율성을 보여줍니다. 본 연구는 다양한 안과 AI 작업에 적용 가능한 확장 가능하고 적응 가능한 프레임워크를 제시하며, 다양한 환경에서 안정적인 성능을 보장합니다.

Original Abstract

The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance with clinical practice where diagnosis relies on the synthesis of complementary imaging modalities. Furthermore, the deployment of high-performance AI in resource-limited settings is frequently impeded by the unavailability of advanced three-dimensional imaging hardware. Here, we present the Ophthalmic multimodal Masked Autoencoder (OphMAE), a multi-imaging foundation model engineered to synergize the volumetric depth of 3D Optical Coherence Tomography (OCT) with the planar context of 2D en face OCT. By implementing a novel cross-modal fusion architecture and a unique adaptive inference mechanism, OphMAE was pre-trained on a massive dataset with of 183,875 paired OCT images derived from 32,765 patients. In a rigorous benchmark encompassing 17 diverse diagnostic tasks with 48,340 paired OCT images from 8,191 patients, the model demonstrated state-of-the-art performance, achieving an Area Under the Curve (AUC) of 96.9% for Age-related Macular Degeneration (AMD) and 97.2% for Diabetic Macular Edema (DME), consistently surpassing existing single-modal and multimodal foundation models. Crucially, OphMAE exhibits robust engineering adaptability: it maintains high diagnostic accuracy, such as 93.7\% AUC for AMD, even when restricted to single-modality 2D inputs, and demonstrates exceptional data efficiency by retaining 95.7% AUC with as few as 500 labeled samples. This work establishes a scalable and adaptable framework for ophthalmic AI, ensuring robust performance across different tasks.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!