2602.08603v1 Feb 09, 2026 cs.AI

OSCAR: 복합 이미지 검색을 위한 최적화 유도 에이전트 플래닝

OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

Teng Wang

Citations: 24

h-index: 2

Rong Shan

Citations: 330

h-index: 8

Jianping Zhang

Citations: 132

h-index: 4

Weinan Zhang

Citations: 571

h-index: 14

Zhaoxiang Wang

Citations: 36

h-index: 3

Jianghao Lin

Shanghai Jiao Tong University

Citations: 1,555

h-index: 20

Junjie Wu

Citations: 33

h-index: 3

Tianyi Xu

Citations: 39

h-index: 2

Wenteng Chen

Citations: 71

h-index: 2

Changwang Zhang

Citations: 293

h-index: 11

Jun Wang

Citations: 3

h-index: 1

복합 이미지 검색(CIR)은 이종의 시각 및 텍스트 제약 조건에 대한 복잡한 추론을 필요로 한다. 기존 접근 방식은 크게 두 가지 패러다임으로 나뉘는데, 단일 모델의 근시안적 한계를 겪는 통합 임베딩 검색과 최적화되지 않은 시행착오적 조정으로 인해 성능이 제한적인 휴리스틱 에이전트 검색이 그것이다. 이를 해결하기 위해, 우리는 복합 이미지 검색을 위한 최적화 유도 에이전트 플래닝 프레임워크인 OSCAR를 제안한다. 우리는 에이전트 기반 CIR을 휴리스틱 탐색 과정에서 체계적인 궤적 최적화 문제로 재정식화한 최초의 연구이다. OSCAR는 휴리스틱한 시행착오적 탐색에 의존하는 대신, 새로운 오프라인-온라인 패러다임을 채택한다. 오프라인 단계에서는 CIR을 원자적 검색 선택 및 구성으로 모델링하여 2단계 혼합 정수 계획법(mixed-integer programming) 문제로 정의하고, 엄밀한 불리언 집합 연산을 통해 학습 샘플에 대한 정답 커버리지를 극대화하는 최적 궤적을 수학적으로 도출한다. 이러한 궤적들은 골든 라이브러리에 저장되어, 온라인 추론 시 VLM 플래너를 유도하기 위한 인컨텍스트 예시(demonstration)로 활용된다. 3개의 공개 벤치마크와 1개의 비공개 산업용 벤치마크에 대한 광범위한 실험 결과, OSCAR가 최신 베이스라인 성능을 일관되게 능가함을 보여준다. 특히, 학습 데이터의 10%만 사용하고도 우수한 성능을 달성함으로써, 데이터셋에 특화된 암기가 아닌 플래닝 로직의 강력한 일반화 능력을 입증하였다.

Original Abstract

Composed image retrieval (CIR) requires complex reasoning over heterogeneous visual and textual constraints. Existing approaches largely fall into two paradigms: unified embedding retrieval, which suffers from single-model myopia, and heuristic agentic retrieval, which is limited by suboptimal, trial-and-error orchestration. To this end, we propose OSCAR, an optimization-steered agentic planning framework for composed image retrieval. We are the first to reformulate agentic CIR from a heuristic search process into a principled trajectory optimization problem. Instead of relying on heuristic trial-and-error exploration, OSCAR employs a novel offline-online paradigm. In the offline phase, we model CIR via atomic retrieval selection and composition as a two-stage mixed-integer programming problem, mathematically deriving optimal trajectories that maximize ground-truth coverage for training samples via rigorous boolean set operations. These trajectories are then stored in a golden library to serve as in-context demonstrations for online steering of VLM planner at online inference time. Extensive experiments on three public benchmarks and a private industrial benchmark show that OSCAR consistently outperforms SOTA baselines. Notably, it achieves superior performance using only 10% of training data, demonstrating strong generalization of planning logic rather than dataset-specific memorization.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!