2602.21154v1 Feb 24, 2026 cs.AI

CG-DMER: 분리된 다중 모드 심전도 표현 학습을 위한 하이브리드 대비-생성 프레임워크

CG-DMER: Hybrid Contrastive-Generative Framework for Disentangled Multimodal ECG Representation Learning

Ziwei Niu

Citations: 196

h-index: 7

Shujun Bian

Citations: 3

h-index: 1

Xihong Yang

Citations: 2,161

h-index: 22

Lanfen Lin

Citations: 201

h-index: 8

Yuxin Liu

Citations: 9

h-index: 2

Yueming Jin

Citations: 21

h-index: 2

Hao Sun

Citations: 25

h-index: 2

심전도(ECG) 신호의 정확한 해석은 심혈관 질환 진단에 매우 중요합니다. 최근에는 ECG와 함께 제공되는 임상 보고서를 통합하는 다중 모드 접근 방식이 강력한 잠재력을 보여주고 있지만, 모드 관점에서 다음과 같은 두 가지 주요 문제점이 여전히 존재합니다. (1) 모드 내부: 기존 모델은 ECG를 리드에 독립적으로 처리하여 리드 간의 공간-시간적 의존성을 간과하며, 이는 미세한 진단 패턴을 모델링하는 데 한계를 초래합니다. (2) 모드 간: 기존 방법은 ECG 신호를 임상 보고서와 직접적으로 연결하여, 보고서의 자유 형식적인 특성으로 인해 발생하는 모드 특유의 편향을 초래합니다. 이러한 두 가지 문제점을 해결하기 위해, 우리는 분리된 다중 모드 ECG 표현 학습을 위한 대비-생성 프레임워크인 CG-DMER을 제안합니다. CG-DMER은 다음 두 가지 핵심 설계를 기반으로 합니다. (1) 공간-시간 마스킹 모델링은 공간 및 시간 차원을 모두 가리고 누락된 정보를 재구성함으로써 미세한 시간적 역학 및 리드 간의 공간적 의존성을 보다 효과적으로 포착하도록 설계되었습니다. (2) 표현 분리 및 정렬 전략은 모드 특유의 인코더와 공유 인코더를 도입하여 불필요한 노이즈와 모드 특유의 편향을 줄이고, 모드 불변 표현과 모드 특이적 표현을 명확하게 분리하도록 설계되었습니다. 세 개의 공개 데이터 세트에 대한 실험 결과, CG-DMER은 다양한 downstream 작업에서 최첨단 성능을 달성하는 것으로 나타났습니다.

Original Abstract

Accurate interpretation of electrocardiogram (ECG) signals is crucial for diagnosing cardiovascular diseases. Recent multimodal approaches that integrate ECGs with accompanying clinical reports show strong potential, but they still face two main concerns from a modality perspective: (1) intra-modality: existing models process ECGs in a lead-agnostic manner, overlooking spatial-temporal dependencies across leads, which restricts their effectiveness in modeling fine-grained diagnostic patterns; (2) inter-modality: existing methods directly align ECG signals with clinical reports, introducing modality-specific biases due to the free-text nature of the reports. In light of these two issues, we propose CG-DMER, a contrastive-generative framework for disentangled multimodal ECG representation learning, powered by two key designs: (1) Spatial-temporal masked modeling is designed to better capture fine-grained temporal dynamics and inter-lead spatial dependencies by applying masking across both spatial and temporal dimensions and reconstructing the missing information. (2) A representation disentanglement and alignment strategy is designed to mitigate unnecessary noise and modality-specific biases by introducing modality-specific and modality-shared encoders, ensuring a clearer separation between modality-invariant and modality-specific representations. Experiments on three public datasets demonstrate that CG-DMER achieves state-of-the-art performance across diverse downstream tasks.

0 Citations

0 Influential

11 Altmetric

55.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!