2602.07095v1 Feb 06, 2026 cs.CV

WorldEdit: 지식 기반 벤치마크를 활용한 오픈 월드 이미지 편집

WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark

Wang Lin

Citations: 107

h-index: 5

Feng Wang

Citations: 332

h-index: 7

Majun Zhang

Citations: 23

h-index: 3

Wentao Hu

Citations: 95

h-index: 5

Tao Jin

Citations: 4

h-index: 1

Zhou Zhao

Citations: 183

h-index: 8

Fei Wu

Citations: 9

h-index: 1

Jingyuan Chen

Citations: 557

h-index: 15

Alan L. Yuille

Citations: 1,431

h-index: 20

Sucheng Ren

Citations: 402

h-index: 9

최근 이미지 편집 모델의 발전은 속성 조작, 스타일 변환, 자세 합성 등 명시적인 지시사항을 실행하는 데 놀라운 능력을 보여주었습니다. 그러나 이러한 모델은 종종 시각적 변화의 원인을 명시적으로 설명하지 않고 결과만을 나타내는 암시적 편집 지시사항을 처리하는 데 어려움을 겪습니다. 이러한 한계는 기존 모델이 복잡한 세계 지식과 추론을 필요로 하는 암시적 지시사항에 적합하지 않은 균일한 편집 전략에 의존하기 때문입니다. 이러한 격차를 해소하기 위해, 우리는 세계 지향적인 이미지 편집을 가능하게 하기 위해 특별히 설계된 데이터셋인 extbf{WorldEdit}을 소개합니다. WorldEdit은 실제 세계의 인과 관계 논리에 부합하는 재구성된 지시사항에 의해 안내되는 고품질의 편집 샘플로 구성됩니다. 또한, 기존 모델의 인과적 편집 시나리오 성능을 평가하기 위한 extbf{WorldEdit-Test}를 제공합니다. WorldEdit을 사용하여 Bagel과 같은 모델을 미세 조정하는 데 사용되는 2단계 훈련 프레임워크를 제안하며, 인과적 검증 보상을 통합합니다. 우리의 결과는 제안된 데이터셋과 방법이 GPT-4o 및 Nano-Banana와의 격차를 크게 줄이며, 지시사항 준수뿐만 아니라 많은 오픈 소스 시스템이 어려움을 겪는 지식의 타당성에서도 경쟁력 있는 성능을 보여준다는 것을 보여줍니다.

Original Abstract

Recent advances in image editing models have demonstrated remarkable capabilities in executing explicit instructions, such as attribute manipulation, style transfer, and pose synthesis. However, these models often face challenges when dealing with implicit editing instructions, which describe the cause of a visual change without explicitly detailing the resulting outcome. These limitations arise because existing models rely on uniform editing strategies that are not equipped to handle the complex world knowledge and reasoning required for implicit instructions. To address this gap, we introduce \textbf{WorldEdit}, a dataset specifically designed to enable world-driven image editing. WorldEdit consists of high-quality editing samples, guided by paraphrased instructions that align with real-world causal logic. Furthermore, we provide \textbf{WorldEdit-Test} for evaluating the existing model's performance on causal editing scenarios. With WorldEdit, we use a two-stage training framework for fine-tuning models like Bagel, integrating with a causal verification reward. Our results show that the proposed dataset and methods significantly narrow the gap with GPT-4o and Nano-Banana, demonstrating competitive performance not only in instruction following but also in knowledge plausibility, where many open-source systems typically struggle.

1 Citations

0 Influential

10 Altmetric

51.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!