2602.10863v1 Feb 11, 2026 cs.LG

ICA: 시각 정보를 활용한 정보 검색 에이전트의 효율적인 보상 할당 방법

ICA: Information-Aware Credit Assignment for Visually Grounded Long-Horizon Information-Seeking Agents

Cong Pang

Citations: 2

h-index: 1

Xuyu Feng

Citations: 18

h-index: 3

Yujie Yi

Citations: 11

h-index: 2

Zixuan Chen

Citations: 1,533

h-index: 1

Jiawei Hong

Citations: 553

h-index: 4

Tiankuo Yao

Citations: 1

h-index: 1

Nang Yuan

Citations: 9

h-index: 1

Lewei Lu

Citations: 3

h-index: 1

Xin Lou

Citations: 2

h-index: 1

Jiapeng Luo

Citations: 3,071

h-index: 8

강화 학습으로 훈련된 정보 검색 에이전트들이 뛰어난 성능을 보이는 반면, 개방형 웹 환경에서의 학습은 여전히 낮은 신호 대 잡음 비율로 인해 심각한 제약을 받습니다. 텍스트 기반 파서는 종종 레이아웃 의미를 무시하고 구조화되지 않은 잡음을 발생시키며, 장기적인 학습은 일반적으로 희소한 결과 보상에 의존하여 어떤 검색 행동이 실제로 중요한지에 대한 정보를 가립니다. 본 연구에서는 웹페이지를 시각적 스냅샷으로 표현하는 시각 기반 검색 프레임워크를 제안합니다. 이를 통해 에이전트는 레이아웃 힌트를 활용하여 중요한 증거를 빠르게 찾아내고 불필요한 정보를 제거할 수 있습니다. 이러한 고차원적인 관찰로부터 효과적으로 학습하기 위해, 우리는 사후 분석을 통해 각 검색된 스냅샷이 최종 결과에 기여하는 정도를 추정하고, 밀집된 학습 신호를 핵심 검색 단계로 전달하는 사후 보상 할당 방법인 정보 인식 보상 할당(Information-Aware Credit Assignment, ICA)을 제안합니다. GRPO 기반의 학습 파이프라인과 결합된 본 접근 방식은 다양한 정보 검색 벤치마크에서 텍스트 기반의 기존 방법보다 일관되게 우수한 성능을 보입니다. 이는 시각적 스냅샷 기반의 접근 방식과 정보 수준의 보상 할당이 개방형 웹 환경에서의 보상 할당 문제를 완화하는 데 효과적임을 보여줍니다. 코드와 데이터셋은 https://github.com/pc-inno/ICA_MM_deepsearch.git 에서 공개될 예정입니다.

Original Abstract

Despite the strong performance achieved by reinforcement learning-trained information-seeking agents, learning in open-ended web environments remains severely constrained by low signal-to-noise feedback. Text-based parsers often discard layout semantics and introduce unstructured noise, while long-horizon training typically relies on sparse outcome rewards that obscure which retrieval actions actually matter. We propose a visual-native search framework that represents webpages as visual snapshots, allowing agents to leverage layout cues to quickly localize salient evidence and suppress distractors. To learn effectively from these high-dimensional observations, we introduce Information-Aware Credit Assignment (ICA), a post-hoc method that estimates each retrieved snapshot's contribution to the final outcome via posterior analysis and propagates dense learning signals back to key search turns. Integrated with a GRPO-based training pipeline, our approach consistently outperforms text-based baselines on diverse information-seeking benchmarks, providing evidence that visual snapshot grounding with information-level credit assignment alleviates the credit-assignment bottleneck in open-ended web environments. The code and datasets will be released in https://github.com/pc-inno/ICA_MM_deepsearch.git.

1 Citations

0 Influential

24 Altmetric

121.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!