2604.13427v1 Apr 15, 2026 cs.GR

모션 생성, 편집 및 내부 구조 재매핑을 위한 통합 조건부 흐름

A Unified Conditional Flow for Motion Generation, Editing, and Intra-Structural Retargeting

Haibin Huang

Citations: 65

h-index: 4

Yilin Zhao

Citations: 90

h-index: 3

Xin Song

Citations: 59

h-index: 4

Junli Li

Citations: 82

h-index: 6

Siqi Wang

Citations: 31

h-index: 2

텍스트 기반 모션 편집 및 내부 구조 재매핑은 일반적으로 호환되지 않는 입력 및 표현 방식을 가진 분산된 파이프라인으로 처리됩니다. 편집은 특수한 생성 제어에 의존하는 반면, 재매핑은 기하학적 후처리로 지연됩니다. 본 연구에서는 두 작업을 단일 생성 프레임워크 내에서 조건부 수송의 사례로 통합하는 통합적인 관점을 제시합니다. 최근의 플로우 매칭 기술을 활용하여, 편집과 재매핑은 본질적으로 동일한 생성 작업이며, 추론 과정에서 어떤 조건부 신호(의미론적 또는 구조적)가 조절되는지에 따라 구분될 뿐임을 보여줍니다. 우리는 텍스트 프롬프트와 대상 골격 구조에 동시에 조건부로 작동하는 수정된 플로우 모션 모델을 구현했습니다. 제안하는 아키텍처는 DiT 스타일의 트랜스포머를 확장하여 각 관절에 대한 토큰화와 명시적인 관절 자기 주의 메커니즘을 적용하여 엄격한 운동학적 의존성을 보장하며, 멀티 컨디션 클래시파이어-프리 가이드 전략을 사용하여 텍스트 준수와 골격 일치성을 균형 있게 유지합니다. SnapMoGen 데이터셋과 Mixamo 데이터셋의 다중 캐릭터 서브셋에 대한 실험 결과, 단일 모델로 텍스트-투-모션 생성, 제로샷 편집, 제로샷 내부 구조 재매핑을 지원할 수 있음을 확인했습니다. 이러한 통합적인 접근 방식은 배포를 단순화하고 작업별 기준 모델에 비해 구조적 일관성을 향상시킵니다.

Original Abstract

Text-driven motion editing and intra-structural retargeting, where source and target share topology but may differ in bone lengths, are traditionally handled by fragmented pipelines with incompatible inputs and representations: editing relies on specialized generative steering, while retargeting is deferred to geometric post-processing. We present a unifying perspective where both tasks are cast as instances of conditional transport within a single generative framework. By leveraging recent advances in flow matching, we demonstrate that editing and retargeting are fundamentally the same generative task, distinguished only by which conditioning signal, semantic or structural, is modulated during inference. We implement this vision via a rectified-flow motion model jointly conditioned on text prompts and target skeletal structures. Our architecture extends a DiT-style transformer with per-joint tokenization and explicit joint self-attention to strictly enforce kinematic dependencies, while a multi-condition classifier-free guidance strategy balances text adherence with skeletal conformity. Experiments on SnapMoGen and a multi-character Mixamo subset show that a single trained model supports text-to-motion generation, zero-shot editing, and zero-shot intra-structural retargeting. This unified approach simplifies deployment and improves structural consistency compared to task-specific baselines.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!