2602.07595v1 Feb 07, 2026 cs.CV

TeleBoost: 고품질, 제어 가능하며 강력한 비디오 생성을 위한 체계적인 정렬 프레임워크

TeleBoost: A Systematic Alignment Framework for High-Fidelity, Controllable, and Robust Video Generation

Yuanzhi Liang

Citations: 300

h-index: 6

Xuaner Wu

Citations: 12

h-index: 1

Yirui Liu

Citations: 3

h-index: 1

Yijie Fang

Citations: 10

h-index: 1

Yizhe Fan

Citations: 14

h-index: 2

Kexian Hao

Citations: 2

h-index: 1

Rui Li

Citations: 23

h-index: 3

Ruiying Liu

Citations: 6

h-index: 2

Ziqi Ni

Citations: 15

h-index: 2

Peng Yu

Citations: 9

h-index: 1

Yanbo Wang

Citations: 43

h-index: 2

Haibin Huang

Citations: 65

h-index: 4

Qizhen Weng

Citations: 27

h-index: 3

Chi Zhang

Citations: 52

h-index: 4

Xuelong Li

Citations: 8

h-index: 2

사후 훈련은 사전 훈련된 비디오 생성 모델을 지침을 따르고, 제어 가능하며, 장기간에 걸쳐 안정적인 생산용 모델로 전환하는 데 결정적인 단계입니다. 본 보고서는 지도 기반 정책 형성, 보상 기반 강화 학습, 그리고 선호도 기반 개선을 하나의 안정성 제약 조건 최적화 스택으로 통합하는 체계적인 사후 훈련 프레임워크를 제시합니다. 이 프레임워크는 높은 연산 비용, 시간 경과에 따른 오류 누적, 그리고 이질적이고 불확실하며 종종 미약한 구별력을 갖는 피드백과 같은 실제 비디오 생성 제약 조건을 고려하여 설계되었습니다. 본 보고서는 최적화를 독립적인 기술들의 모음이 아닌 단계별, 진단 중심의 프로세스로 간주함으로써, 시각적 충실도, 시간적 일관성, 그리고 초기 설정된 제어 가능성을 유지하면서 프롬프트 준수를 개선하는 효과적인 방법을 제시합니다. 결과적으로, 이 프레임워크는 안정적이고 확장 가능하며 실제 환경에서 효과적인 사후 훈련 파이프라인을 구축하기 위한 명확한 청사진을 제공합니다.

Original Abstract

Post-training is the decisive step for converting a pretrained video generator into a production-oriented model that is instruction-following, controllable, and robust over long temporal horizons. This report presents a systematical post-training framework that organizes supervised policy shaping, reward-driven reinforcement learning, and preference-based refinement into a single stability-constrained optimization stack. The framework is designed around practical video-generation constraints, including high rollout cost, temporally compounding failure modes, and feedback that is heterogeneous, uncertain, and often weakly discriminative. By treating optimization as a staged, diagnostic-driven process rather than a collection of isolated tricks, the report summarizes a cohesive recipe for improving perceptual fidelity, temporal coherence, and prompt adherence while preserving the controllability established at initialization. The resulting framework provides a clear blueprint for building scalable post-training pipelines that remain stable, extensible, and effective in real-world deployment settings.

1 Citations

0 Influential

3 Altmetric

16.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!