2603.21613v1 Mar 23, 2026 cs.IR

AgenticRec: 순위 지향형 추천 에이전트를 위한 통합 도구 기반 정책 최적화

AgenticRec: End-to-End Tool-Integrated Policy Optimization for Ranking-Oriented Recommender Agents

Tianyi Li

Citations: 4

h-index: 1

Zixuan Wang

Citations: 32

h-index: 3

Guidong Lei

Citations: 0

h-index: 0

Xiaodong Li

Citations: 0

h-index: 0

Hui Li

Citations: 0

h-index: 0

대규모 언어 모델을 기반으로 구축된 추천 에이전트는 추천 시스템 분야에서 유망한 패러다임을 제시합니다. 그러나 기존의 추천 에이전트는 일반적으로 중간 추론과 최종 순위 피드백 간의 단절로 인해 어려움을 겪으며, 세부적인 사용자 선호도를 파악하는 데 한계가 있습니다. 이러한 문제를 해결하기 위해, 본 논문에서는 희소한 암묵적 피드백 하에서 전체 의사 결정 경로(중간 추론, 도구 호출, 최종 순위 목록 생성 포함)를 최적화하는 순위 지향형 에이전트 추천 프레임워크인 AgenticRec을 제안합니다. 본 연구는 세 가지 주요 기여를 합니다. 첫째, 근거 기반 추론을 지원하기 위해 추천에 특화된 도구 모음을 설계하고 ReAct 루프에 통합했습니다. 둘째, 복잡한 도구 사용 경로에 대한 정확한 기여도 할당을 보장하여 순위 유용성을 극대화하기 위한 이론적으로 편향되지 않은 List-Wise Group Relative Policy Optimization (list-wise GRPO)을 제안합니다. 셋째, 미세한 선호도 모호성을 해결하기 위해 Progressive Preference Refinement (PPR)을 도입하여 순위 위반에서 추출된 어려운 부정 샘플을 활용하고 양방향 선호도 정렬을 적용하여 쌍별 순위 오류의 볼록 상한을 최소화합니다. 벤치마크 실험 결과, AgenticRec이 기존 방식보다 현저히 우수한 성능을 보이며, 추론, 도구 사용, 순위 최적화를 통합하는 것의 필요성을 입증합니다.

Original Abstract

Recommender agents built on Large Language Models offer a promising paradigm for recommendation. However, existing recommender agents typically suffer from a disconnect between intermediate reasoning and final ranking feedback, and are unable to capture fine-grained preferences. To address this, we present AgenticRec, a ranking-oriented agentic recommendation framework that optimizes the entire decision-making trajectory (including intermediate reasoning, tool invocation, and final ranking list generation) under sparse implicit feedback. Our approach makes three key contributions. First, we design a suite of recommendation-specific tools integrated into a ReAct loop to support evidence-grounded reasoning. Second, we propose theoretically unbiased List-Wise Group Relative Policy Optimization (list-wise GRPO) to maximize ranking utility, ensuring accurate credit assignment for complex tool-use trajectories. Third, we introduce Progressive Preference Refinement (PPR) to resolve fine-grained preference ambiguities. By mining hard negatives from ranking violations and applying bidirectional preference alignment, PPR minimizes the convex upper bound of pairwise ranking errors. Experiments on benchmarks confirm that AgenticRec significantly outperforms baselines, validating the necessity of unifying reasoning, tool use, and ranking optimization.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!