2605.14534v1 May 14, 2026 cs.CV

PROVE: 시각 미디어의 객체 제거 성능 평가를 위한 인지적 일관성 벤치마크

PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

Jian Luan

Citations: 220

h-index: 7

Jiagao Hu

Citations: 61

h-index: 4

Daiguo Zhou

Citations: 44

h-index: 4

Fuhao Li

Citations: 52

h-index: 3

Zepeng Wang

Citations: 43

h-index: 3

Shaofeng You

Citations: 1

h-index: 1

Yu Liu

Citations: 160

h-index: 7

Yuxuan Chen

Citations: 71

h-index: 3

Fei Wang

Citations: 4

h-index: 1

이미지와 비디오에서 객체 제거 성능을 평가하는 것은 여전히 어려운 과제입니다. 왜냐하면 이 작업은 본질적으로 일대다 관계를 가지지만, 기존의 평가 지표들은 종종 인간의 인지적 판단과 일치하지 않기 때문입니다. 완전 참조 기반 지표는 진정한 제거보다는 복사-붙여넣기 동작에 높은 점수를 부여하며, 참조 없는 지표는 흐릿한 결과에 편향되는 체계적인 오류를 가지고 있습니다. 또한, 전체적인 시간적 지표는 편집된 영역 내의 국소적인 문제점을 제대로 감지하지 못합니다. 이러한 한계점을 극복하기 위해, 우리는 인지적 관점에 맞춰 설계된 RC (Removal Coherence)라는 두 가지 평가 지표를 제안합니다. RC-S는 슬라이딩 윈도우 기반의 특징 비교를 통해 공간적 일관성을 측정하며, RC-T는 인접 프레임 간 공유된 복원 영역 내의 분포 추적을 통해 시간적 일관성을 측정합니다. RC의 유효성을 검증하고 커뮤니티의 벤치마킹을 지원하기 위해, 우리는 두 단계로 구성된 실제 환경 기반 벤치마크인 PROVE-Bench를 추가로 소개합니다. PROVE-Bench는 모션 증강을 적용한 80개의 동영상 페어 데이터셋인 PROVE-M과, 정답 데이터가 없는 100개의 도전적인 동영상 서브셋인 PROVE-H로 구성됩니다. RC 지표와 PROVE-Bench는 함께 시각 미디어의 객체 제거 성능을 평가하는 PROVE (Perceptual RemOVal cohErence) 프레임워크를 형성합니다. 다양한 이미지 및 비디오 벤치마크에서의 실험 결과, RC는 기존의 평가 방법보다 인간의 판단과 훨씬 더 높은 일치도를 보이는 것으로 나타났습니다. RC 지표 및 PROVE-Bench의 코드는 다음 주소에서 공개적으로 이용할 수 있습니다: https://github.com/xiaomi-research/prove/.

Original Abstract

Evaluating object removal in images and videos remains challenging because the task is inherently one-to-many, yet existing metrics frequently disagree with human perception. Full-reference metrics reward copy-paste behaviors over genuine erasure; no-reference metrics suffer from systematic biases such as favoring blurry results; and global temporal metrics are insensitive to localized artifacts within edited regions. To address these limitations, we propose RC (Removal Coherence), a pair of perception-aligned metrics: RC-S, which measures spatial coherence via sliding-window feature comparison between masked and background regions, and RC-T, which measures temporal consistency via distribution tracking within shared restored regions across adjacent frames. To validate RC and support community benchmarking, we further introduce PROVE-Bench, a two-tier real-world benchmark comprising PROVE-M, an 80-video paired dataset with motion augmentation, and PROVE-H, a 100-video challenging subset without ground truth. Together, RC metrics and PROVE-Bench form the PROVE (Perceptual RemOVal cohErence) evaluation framework for visual media. Experiments across diverse image and video benchmarks demonstrate that RC achieves substantially stronger alignment with human judgments than existing evaluation protocols. The code for RC metrics and PROVE-Bench are publicly available at: https://github.com/xiaomi-research/prove/.

0 Citations

0 Influential

35.01292546497 Altmetric

175.1 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!