2605.07545v1 May 08, 2026 cs.CV

인간 이미지 애니메이션을 위한 암묵적 선호도 정렬

Implicit Preference Alignment for Human Image Animation

Qinglin Lu

Citations: 1,857

h-index: 13

Yuanzhi Wang

Citations: 302

h-index: 6

Xuhua Ren

Citations: 13

h-index: 2

Jiaxiang Cheng

Citations: 9

h-index: 2

Bing Ma

Citations: 1

h-index: 1

Tianxiang Zheng

Citations: 13

h-index: 2

Zhen Cui

Citations: 18

h-index: 2

Kai Yu

Citations: 9

h-index: 2

인간 이미지 애니메이션 분야는 상당한 발전을 이루었지만, 자유도가 높고 복잡한 동작 특성으로 인해 고품질의 손 동작을 생성하는 것은 여전히 어려운 과제입니다. 강화 학습을 통해 인간의 피드백, 특히 직접적인 선호도 최적화를 활용하는 것은 잠재적인 해결책이 될 수 있지만, 이를 위해서는 엄격한 선호도 쌍을 구성해야 합니다. 그러나 동적인 손 영역에 대한 이러한 쌍을 구성하는 것은 비용이 많이 들고, 프레임별 불일치로 인해 종종 비현실적입니다. 본 논문에서는 선호도 쌍 데이터가 필요 없는 효율적인 후처리 프레임워크인 암묵적 선호도 정렬(Implicit Preference Alignment, IPA)을 제안합니다. IPA는 암묵적 보상 최대화 이론에 기반하여, 사전 학습된 모델의 선호도를 벗어나는 경우를 패널티로 주면서, 모델 자체에서 생성된 고품질 샘플의 발생 가능성을 최대화하여 모델을 정렬합니다. 또한, 손 영역에 대한 인식 능력을 갖춘 로컬 최적화 메커니즘을 도입하여 정렬 과정을 명시적으로 손 영역으로 유도합니다. 실험 결과, 제안하는 방법은 손 동작 생성 품질을 향상시키는 효과적인 선호도 최적화를 달성하며, 동시에 선호도 데이터를 구축하는 데 필요한 노력을 크게 줄입니다. 관련 코드는 https://github.com/mdswyz/IPA 에서 확인할 수 있습니다.

Original Abstract

Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly direct preference optimization, offers a potential solution, it necessitates the construction of strict preference pairs. However, curating such pairs for dynamic hand regions is prohibitively expensive and often impractical due to frame-wise inconsistencies. In this paper, we propose Implicit Preference Alignment (IPA), a data-efficient post-training framework that eliminates the need for paired preference data. Theoretically grounded in implicit reward maximization, IPA aligns the model by maximizing the likelihood of self-generated high-quality samples while penalizing deviations from the pretrained prior. Furthermore, we introduce a Hand-Aware Local Optimization mechanism to explicitly steer the alignment process toward hand regions. Experiments demonstrate that our method achieves effective preference optimization to enhance hand generation quality, while significantly lowering the barrier for constructing preference data. Codes are released at https://github.com/mdswyz/IPA

0 Citations

0 Influential

29.9657359028 Altmetric

149.8 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!