2605.07177v1 May 08, 2026 cs.LG

HyperEyes: 병렬 다중 모드 검색 에이전트를 위한 효율성 기반 이중 계층 강화 학습

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

Xichen Zhang

Citations: 72

h-index: 4

Yi Xu

Citations: 27

h-index: 2

Guankai Li

Citations: 30

h-index: 1

Jiabin Chen

Citations: 64

h-index: 2

Yuanfei Lu

Citations: 13

h-index: 1

기존의 다중 모드 검색 에이전트는 대상 객체를 순차적으로 처리하며, 각 객체마다 하나의 도구 호출을 수행하고, 쿼리가 독립적인 하위 검색으로 분해될 때 불필요한 상호 작용 단계를 반복합니다. 우리는 효과적인 다중 모드 에이전트는 더 넓게 검색해야 하며, 즉, 쿼리당 여러 개의 기반 쿼리를 동시에 실행해야 한다고 주장합니다. 이를 위해, 우리는 시각적 기반 쿼리와 검색을 하나의 원자적 액션으로 융합하여 여러 객체에 대한 동시 검색을 가능하게 하고, 추론 효율성을 주요 학습 목표로 고려하는 병렬 다중 모드 검색 에이전트인 HyperEyes를 제안합니다. HyperEyes는 두 단계로 학습됩니다. 초기 학습을 위해, 우리는 시각적 다객체 및 텍스트 다중 제약 조건 쿼리를 포함하는 병렬 처리에 적합한 데이터 합성 파이프라인을 개발하고, 점진적인 거부 샘플링을 통해 효율성을 지향하는 학습 경로를 구축합니다. 핵심적인 기여는 이중 계층 효율성 기반 강화 학습 프레임워크이며, 이는 두 가지 수준에서 작동합니다. 거시 수준에서, 우리는 도구 사용 참조-적응 비용 효율성(TRACE)이라는 경로 수준 보상을 제안하며, 이는 학습 과정에서 지속적으로 강화되어 불필요한 도구 호출을 억제하면서도 진정한 다중 단계 검색을 제한하지 않습니다. 미시 수준에서, 우리는 온-정책 증류를 활용하여 실패한 시뮬레이션에서 외부 교사 모델로부터 밀집된 토큰 수준의 교정 신호를 주입하여, 희소한 결과 보상의 신용 할당 문제를 완화합니다. 기존 벤치마크는 정확도를 유일한 지표로 사용하며, 추론 비용을 고려하지 않기 때문에, 우리는 검색 능력과 효율성을 동시에 평가하는 300개의 인스턴스로 구성된 인간이 선별한 벤치마크인 IMEB를 도입합니다. 6개의 벤치마크에서, HyperEyes-30B는 가장 강력한 공개 소스 에이전트보다 정확도에서 9.9% 더 뛰어나며, 평균적으로 도구 호출 횟수가 5.3배 적습니다.

Original Abstract

Existing multimodal search agents process target entities sequentially, issuing one tool call per entity and accumulating redundant interaction rounds whenever a query decomposes into independent sub-retrievals. We argue that effective multimodal agents should search wider rather than longer: dispatching multiple grounded queries concurrently within a round. To this end, we present HyperEyes, a parallel multimodal search agent that fuses visual grounding and retrieval into a single atomic action, enabling concurrent search across multiple entities while treating inference efficiency as a first-class training objective. HyperEyes is trained in two stages. For cold-start supervision, we develop a Parallel-Amenable Data Synthesis Pipeline covering visual multi-entity and textual multi-constraint queries, curating efficiency-oriented trajectories via Progressive Rejection Sampling. Building on this, our central contribution, a Dual-Grained Efficiency-Aware Reinforcement Learning framework, operates at two levels. At the macro level, we propose TRACE (Tool-use Reference-Adaptive Cost Efficiency), a trajectory-level reward whose reference is monotonically tightened during training to suppress superfluous tool calls without restricting genuine multi-hop search. At the micro level, we adapt On-Policy Distillation to inject dense token-level corrective signals from an external teacher on failed rollouts, mitigating the credit-assignment deficiency of sparse outcome rewards. Since existing benchmarks evaluate accuracy as the sole metric, omitting inference cost, we introduce IMEB, a human-curated benchmark of 300 instances that jointly evaluates search capability and efficiency. Across six benchmarks, HyperEyes-30B surpasses the strongest comparable open-source agent by 9.9% in accuracy with 5.3x fewer tool-call rounds on average.

1 Citations

0 Influential

2 Altmetric

11.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!