2602.01839v1 Feb 02, 2026 cs.LG

DOGMA: 구조적 정보를 데이터 중심의 단일 세포 트랜스크립토믹스 분석에 통합하는 방법

DOGMA: Weaving Structural Information into Data-centric Single-cell Transcriptomics Analysis

Xunkai Li

Citations: 462

h-index: 12

Daohan Su

Citations: 80

h-index: 4

Ronghua Li

Citations: 187

h-index: 7

Jia Li

Citations: 1

h-index: 1

Hongchao Qin

Citations: 370

h-index: 10

Sicheng Liu

Citations: 4

h-index: 1

Ru Zhang

Citations: 3

h-index: 1

Yaxin Deng

Citations: 5

h-index: 2

Qiangqiang Dai

Citations: 462

h-index: 13

Guoren Wang

Citations: 8,890

h-index: 44

최근, 데이터 중심의 인공지능 방법론이 단일 세포 트랜스크립토믹스 분석의 주류 패러다임이 되었으며, 이는 모델의 복잡성보다는 데이터 표현이 근본적인 병목 현상으로 작용하기 때문입니다. 기존 연구들을 살펴보면, 초기 시퀀싱 방법들은 세포를 독립적인 개체로 취급하고, 획득한 시퀀스 데이터를 분석하기 위해 널리 사용되는 머신러닝 모델을 적용합니다. 이러한 방법들은 단순하고 직관적이지만, 생물학적 시스템의 기능적 메커니즘에 의해 발생하는 잠재적인 세포 간의 관계와 원시 시퀀스 데이터의 고유한 품질 문제를 간과합니다. 따라서, 구조적인 방법들이 등장했습니다. 이러한 방법들은 다양한 휴리스틱 규칙을 사용하여 복잡한 세포 간의 관계를 파악하고 원시 시퀀스 데이터를 개선하지만, 종종 생물학적 사전 지식을 무시합니다. 이러한 누락은 상당한 오버헤드를 발생시키고 최적의 그래프 표현을 방해하여 머신러닝 모델의 유용성을 저해합니다. 이러한 문제점을 해결하기 위해, 우리는 DOGMA라는 통합적인 데이터 중심 프레임워크를 제안합니다. DOGMA는 다단계 생물학적 사전 지식을 활용하여 원시 데이터를 구조적으로 재구성하고 의미적으로 향상시킵니다. DOGMA는 확률적 휴리스틱에 대한 의존성을 벗어나, 통계적 기준점(Statistical Anchors)을 세포 온톨로지(Cell Ontology) 및 계통수(Phylogenetic Trees)와 통합하여 그래프 구축을 재정의함으로써 결정적인 구조 발견과 강력한 종 간 정렬을 가능하게 합니다. 또한, 유전자 온톨로지(Gene Ontology)를 활용하여 기능적 사전 지식을 통합함으로써 특징 수준의 의미 격차를 해소합니다. 복잡한 다종 및 다기관 벤치마크에서 DOGMA는 최첨단 성능을 달성했으며, 뛰어난 제로-샷(zero-shot) 강건성과 샘플 효율성을 보이며, 동시에 훨씬 낮은 계산 비용으로 작동합니다.

Original Abstract

Recently, data-centric AI methodology has been a dominant paradigm in single-cell transcriptomics analysis, which treats data representation rather than model complexity as the fundamental bottleneck. In the review of current studies, earlier sequence methods treat cells as independent entities and adapt prevalent ML models to analyze their directly inherited sequence data. Despite their simplicity and intuition, these methods overlook the latent intercellular relationships driven by the functional mechanisms of biological systems and the inherent quality issues of the raw sequence data. Therefore, a series of structured methods has emerged. Although they employ various heuristic rules to capture intricate intercellular relationships and enhance the raw sequencing data, these methods often neglect biological prior knowledge. This omission incurs substantial overhead and yields suboptimal graph representations, thereby hindering the utility of ML models. To address them, we propose DOGMA, a holistic data-centric framework designed for the structural reshaping and semantic enhancement of raw data through multi-level biological prior knowledge. Transcending reliance on stochastic heuristics, DOGMA redefines graph construction by integrating Statistical Anchors with Cell Ontology and Phylogenetic Trees to enable deterministic structure discovery and robust cross-species alignment. Furthermore, Gene Ontology is utilized to bridge the feature-level semantic gap by incorporating functional priors. In complex multi-species and multi-organ benchmarks, DOGMA achieves SOTA performance, exhibiting superior zero-shot robustness and sample efficiency while operating with significantly lower computational cost.

0 Citations

0 Influential

22 Altmetric

110.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!