2603.20755v1 Mar 21, 2026 cs.CV

동적 패치 샘플링 및 블록 건너뛰기를 통한 메모리 효율적인 디퓨전 트랜스포머 미세 조정

Memory-Efficient Fine-Tuning Diffusion Transformers via Dynamic Patch Sampling and Block Skipping

F. Porikli

Citations: 593

h-index: 9

Jaegul Choo

Citations: 314

h-index: 10

Sunghyun Park

Citations: 10

h-index: 2

Jeongho Kim

Citations: 241

h-index: 4

Hyoungwoo Park

Citations: 351

h-index: 6

Debasmit Das

Citations: 665

h-index: 11

Sungrack Yun

Citations: 22

h-index: 3

Munawar Hayat

Citations: 56

h-index: 4

Seokeon Choi

Citations: 18

h-index: 2

디퓨전 트랜스포머(DiT)는 텍스트-이미지(T2I) 생성 품질을 크게 향상시켜 고품질의 개인화된 콘텐츠 제작을 가능하게 합니다. 그러나 이러한 모델을 미세 조정하는 데는 상당한 계산 복잡성과 메모리가 필요하며, 이는 제한된 리소스 환경에서 실제 적용을 어렵게 만듭니다. 이러한 과제를 해결하기 위해, 우리는 시간 단계를 고려한 동적 패치 샘플링과 사전 계산된 잔차 특징을 활용한 블록 건너뛰기를 통합한 메모리 효율적인 미세 조정 프레임워크인 DiT-BlockSkip을 제안합니다. 우리의 동적 패치 샘플링 전략은 디퓨전 시간 단계에 따라 패치 크기를 조정하고, 크롭된 패치를 고정된 낮은 해상도로 리사이즈합니다. 이 접근 방식은 순전파 및 역전파 메모리 사용량을 줄이는 동시에 모델이 높은 시간 단계에서는 전역 구조를, 낮은 시간 단계에서는 미세한 디테일을 포착할 수 있도록 합니다. 블록 건너뛰기 메커니즘은 필수적인 트랜스포머 블록을 선택적으로 미세 조정하고, 건너뛴 블록에 대한 잔차 특징을 미리 계산하여 훈련 메모리를 크게 줄입니다. 개인화에 중요한 블록을 식별하기 위해, 우리는 크로스-어텐션 마스킹을 기반으로 하는 블록 선택 전략을 도입했습니다. 실험 결과는 제안된 방법이 정성적 및 정량적으로 경쟁력 있는 개인화 성능을 달성하며, 메모리 사용량을 크게 줄여 대규모 디퓨전 트랜스포머의 온-디바이스 적용 가능성을 높이는 데 기여함을 보여줍니다 (예: 스마트폰, IoT 장치).

Original Abstract

Diffusion Transformers (DiTs) have significantly enhanced text-to-image (T2I) generation quality, enabling high-quality personalized content creation. However, fine-tuning these models requires substantial computational complexity and memory, limiting practical deployment under resource constraints. To tackle these challenges, we propose a memory-efficient fine-tuning framework called DiT-BlockSkip, integrating timestep-aware dynamic patch sampling and block skipping by precomputing residual features. Our dynamic patch sampling strategy adjusts patch sizes based on the diffusion timestep, then resizes the cropped patches to a fixed lower resolution. This approach reduces forward & backward memory usage while allowing the model to capture global structures at higher timesteps and fine-grained details at lower timesteps. The block skipping mechanism selectively fine-tunes essential transformer blocks and precomputes residual features for the skipped blocks, significantly reducing training memory. To identify vital blocks for personalization, we introduce a block selection strategy based on cross-attention masking. Evaluations demonstrate that our approach achieves competitive personalization performance qualitatively and quantitatively, while reducing memory usage substantially, moving toward on-device feasibility (e.g., smartphones, IoT devices) for large-scale diffusion transformers.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!