2605.04702v1 May 06, 2026 cs.CV

FaithfulFaces: 텍스트-비디오 생성 시 자세 일관성을 유지하는 얼굴 인식 보존

FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

Yuanzhi Wang

Citations: 302

h-index: 6

Xuhua Ren

Citations: 13

h-index: 2

Jiaxiang Cheng

Citations: 9

h-index: 2

Bing Ma

Citations: 1

h-index: 1

Kai Yu

Citations: 72

h-index: 4

Sen Liang

Citations: 7

h-index: 2

Wenyue Li

Citations: 303

h-index: 2

Tianxiang Zheng

Citations: 13

h-index: 2

Qinglin Lu

Citations: 7

h-index: 2

Zhen Cui

Citations: 18

h-index: 2

얼굴 인식 보존 텍스트-비디오 생성(IPT2V)은 사용자가 일관된 인간 얼굴 인식을 유지하면서 다양하고 상상력이 풍부한 비디오를 제작할 수 있도록 지원합니다. 최근의 발전에도 불구하고, 기존 방법은 종종 큰 얼굴 자세 변화나 얼굴 가려짐이 발생할 때 상당한 인체 인식 왜곡을 겪습니다. 본 논문에서는 복잡한 동적 장면에서 IPT2V를 개선하기 위한 자세 일관성을 유지하는 얼굴 인식 보존 학습 프레임워크인 extit{FaithfulFaces}를 제안합니다. FaithfulFaces의 핵심은 자세 공유된 얼굴 인식 정렬기로, 자세 공유된 딕셔너리와 자세 변화-인식 불변성 제약을 통해 서로 다른 시점에서 얼굴 자세를 정제하고 정렬합니다. FaithfulFaces는 명시적인 오일러 각 임베딩을 사용하여 단일 시점 입력을 글로벌 얼굴 자세 표현으로 매핑함으로써, 자세 일관성을 유지하는 얼굴 사전 지식을 제공하여 생성 과정을 강력한 인식 보존 생성으로 이끌도록 합니다. 특히, 본 연구에서는 상당한 얼굴 자세 다양성을 특징으로 하는 고품질 비디오 데이터셋을 구축하기 위한 특수 파이프라인을 개발했습니다. 광범위한 실험 결과, FaithfulFaces는 최첨단 성능을 달성하며, 자세 변화와 가려짐이 발생하더라도 우수한 인식 일관성과 구조적 명확성을 유지하는 것으로 나타났습니다.

Original Abstract

Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer from significant identity distortion under large facial pose variations or facial occlusions. In this paper, we propose \textit{FaithfulFaces}, a pose-faithful facial identity preservation learning framework to improve IPT2V in complex dynamic scenes. The key of FaithfulFaces is a pose-shared identity aligner that refines and aligns facial poses across distinct views via a pose-shared dictionary and a pose variation-identity invariance constraint. By mapping single-view inputs into a global facial pose representation with explicit Euler angle embeddings, FaithfulFaces provides a pose-faithful facial prior that guides generative foundations toward robust identity-preserving generation. In particular, we develop a specialized pipeline to curate a high-quality video dataset featuring substantial facial pose diversity. Extensive experiments demonstrate that FaithfulFaces achieves state-of-the-art performance, maintaining superior identity consistency and structural clarity even as pose changes and occlusions occur.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!