2604.05489v1 Apr 07, 2026 cs.AI

SCMAPR: 복잡한 시나리오 텍스트-비디오 생성을 위한 자기 수정 다중 에이전트 프롬프트 개선

SCMAPR: Self-Correcting Multi-Agent Prompt Refinement for Complex-Scenario Text-to-Video Generation

Aimin Zhou

Citations: 5

h-index: 1

Chengyi Yang

Citations: 149

h-index: 3

Pengzhen Li

Citations: 88

h-index: 5

Jiayi Qi

Citations: 10

h-index: 1

Ji Wu

Citations: 22

h-index: 2

Ji Liu

Citations: 3,260

h-index: 4

텍스트-비디오(T2V) 생성이 최근 확산 모델의 발전에 힘입어 발전해 왔지만, 현재 시스템은 여전히 복잡한 시나리오에서 어려움을 겪고 있으며, 이는 일반적으로 텍스트 프롬프트의 모호성과 불명확성 때문에 더욱 악화됩니다. 본 연구에서는 복잡한 시나리오 프롬프트 개선을 단계별 다중 에이전트 개선 프로세스로 정의하고, 시나리오 인지 및 자기 수정 다중 에이전트 프롬프트 개선 프레임워크인 SCMAPR을 제안합니다. SCMAPR은 전문화된 에이전트를 조정하여 (i) 각 프롬프트를 분류 기반 시나리오에 연결하여 전략을 선택하고, (ii) 시나리오 인지적인 재작성 정책을 합성하고, 정책 기반 개선을 수행하며, (iii) 구조화된 의미 검증을 수행하여 위반 사항이 발견되면 조건부 수정을 트리거합니다. T2V 프롬프트에서 복잡한 시나리오가 무엇을 구성하는지 명확히 하고, 대표적인 예시를 제공하며, 이러한 어려운 조건에서 엄격한 평가를 가능하게 하기 위해, 복잡한 시나리오 프롬프트만 포함하는 복잡한 시나리오 T2V 벤치마크인 {T2V-Complexity}를 추가로 소개합니다. 3개의 기존 벤치마크와 T2V-Complexity 벤치마크에 대한 광범위한 실험 결과, SCMAPR은 복잡한 시나리오에서 텍스트-비디오 일관성과 전체 생성 품질을 지속적으로 개선하며, VBench 및 EvalCrafter에서 평균 점수가 각각 최대 2.67% 및 3.28% 향상되고, 3가지 최첨단(State-Of-The-Art) 기준 모델보다 T2V-CompBench에서 최대 0.028만큼 성능이 향상되는 것을 확인했습니다.

Original Abstract

Text-to-Video (T2V) generation has benefited from recent advances in diffusion models, yet current systems still struggle under complex scenarios, which are generally exacerbated by the ambiguity and underspecification of text prompts. In this work, we formulate complex-scenario prompt refinement as a stage-wise multi-agent refinement process and propose SCMAPR, i.e., a scenario-aware and Self-Correcting Multi-Agent Prompt Refinement framework for T2V prompting. SCMAPR coordinates specialized agents to (i) route each prompt to a taxonomy-grounded scenario for strategy selection, (ii) synthesize scenario-aware rewriting policies and perform policy-conditioned refinement, and (iii) conduct structured semantic verification that triggers conditional revision when violations are detected. To clarify what constitutes complex scenarios in T2V prompting, provide representative examples, and enable rigorous evaluation under such challenging conditions, we further introduce {T2V-Complexity}, which is a complex-scenario T2V benchmark consisting exclusively of complex-scenario prompts. Extensive experiments on 3 existing benchmarks and our T2V-Complexity benchmark demonstrate that SCMAPR consistently improves text-video alignment and overall generation quality under complex scenarios, achieving up to 2.67\% and 3.28 gains in average score on VBench and EvalCrafter, and up to 0.028 improvement on T2V-CompBench over 3 State-Of-The-Art baselines.

1 Citations

0 Influential

2.5 Altmetric

13.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!