2603.01396v1 Mar 02, 2026 cs.AI

HarmonyCell: 의미적 및 분포적 변화 하에서의 단일 세포 교란 모델링 자동화

HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

Wenxuan Huang

Citations: 92

h-index: 4

Mingyu Tsoi

Citations: 3

h-index: 1

Yan Huang

Citations: 6

h-index: 1

Xinjie Mao

Citations: 37

h-index: 3

Xue Xia

Citations: 81

h-index: 2

Jiaqi Wei

Citations: 2

h-index: 1

Xiangxiang Zhang

Citations: 140

h-index: 2

Hao Wu

Citations: 101

h-index: 3

Yuejin Yang

Citations: 121

h-index: 5

Lang Yu

Citations: 115

h-index: 3

Cheng Tan

Citations: 10

h-index: 2

Zhan Gao

Citations: 69

h-index: 5

Siqi Sun

Citations: 47

h-index: 4

단일 세포 교란 연구는 두 가지 유형의 이질성 문제에 직면합니다. (i) 의미적 이질성: 서로 다른 데이터 세트에서 동일한 생물학적 개념이 호환되지 않는 메타데이터 스키마로 표현되는 경우, 그리고 (ii) 통계적 이질성: 생물학적 변동으로 인해 발생하는 분포 변화는 데이터 세트별로 특정한 유도 편향을 요구합니다. 우리는 이러한 각 문제를 해결하기 위한 전용 메커니즘을 갖춘 통합 프레임워크인 HarmonyCell을 제안합니다. LLM 기반의 의미 통일기는 수동 개입 없이 서로 다른 메타데이터를 표준 인터페이스로 자동으로 매핑합니다. 또한, 적응형 몬테카를로 트리 탐색 엔진은 계층적 행동 공간에서 작동하여 분포 변화에 최적화된 통계적 유도 편향을 갖는 아키텍처를 합성합니다. HarmonyCell은 다양한 교란 작업에서 의미적 및 분포적 변화 하에서 평가되었으며, 이질적인 입력 데이터 세트에서 95%의 유효 실행률을 달성했습니다 (일반적인 에이전트의 경우 0%). 또한, 엄격한 일반화 평가에서 전문가가 설계한 기준 모델과 동등하거나 그 이상의 성능을 보였습니다. 이러한 이중 트랙 오케스트레이션은 데이터 세트별 맞춤 설정 없이 확장 가능한 자동 가상 세포 모델링을 가능하게 합니다.

Original Abstract

Single-cell perturbation studies face dual heterogeneity bottlenecks: (i) semantic heterogeneity--identical biological concepts encoded under incompatible metadata schemas across datasets; and (ii) statistical heterogeneity--distribution shifts from biological variation demanding dataset-specific inductive biases. We propose HarmonyCell, an end-to-end agent framework resolving each challenge through a dedicated mechanism: an LLM-driven Semantic Unifier autonomously maps disparate metadata into a canonical interface without manual intervention; and an adaptive Monte Carlo Tree Search engine operates over a hierarchical action space to synthesize architectures with optimal statistical inductive biases for distribution shifts. Evaluated across diverse perturbation tasks under both semantic and distribution shifts, HarmonyCell achieves a 95% valid execution rate on heterogeneous input datasets (versus 0% for general agents) while matching or even exceeding expert-designed baselines in rigorous out-of-distribution evaluations. This dual-track orchestration enables scalable automatic virtual cell modeling without dataset-specific engineering.

1 Citations

0 Influential

2.5 Altmetric

13.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!