2604.07851v1 Apr 09, 2026 cs.IR

ReRec: 강화 학습 기반 추론 증강 LLM 추천 어시스턴트

ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning

Wenqi Fan

Citations: 1,632

h-index: 13

Jiani Huang

Citations: 93

h-index: 4

Shijie Wang

The Hong Kong Polytechnic University

Citations: 1,230

h-index: 9

Liang-bo Ning

Citations: 1,287

h-index: 9

Qing Li

Citations: 99

h-index: 5

LLM의 발전과 함께, 복잡한 질의를 처리하고 개인화된, 논리 기반의 추천을 제공할 수 있는 지능형 추천 어시스턴트의 필요성이 증가하고 있습니다. LLM 기반 추천 시스템은 잠재력을 보여주지만, 다단계 추론에서 어려움을 겪으며, 이는 추론 증강 시스템의 필요성을 강조합니다. 이러한 격차를 해결하기 위해, 우리는 복잡한 추천 작업에서 LLM의 추론 능력을 향상시키도록 설계된 새로운 강화 학습 기반 미세 조정(RFT) 프레임워크인 ReRec을 제안합니다. 우리의 프레임워크는 세 가지 주요 구성 요소를 도입합니다. (1) 이중 그래프 기반 보상 형성: 추천 메트릭(NDCG@K)과 질의 정렬 및 선호도 정렬 점수를 통합하여 LLM 최적화를 위한 세밀한 보상 신호를 제공합니다. (2) 추론 인식 어드밴티지 추정: LLM의 출력을 추론 단계로 분해하고 잘못된 단계를 페널티하여 추천의 추론 능력을 향상시킵니다. (3) 온라인 커리큘럼 스케줄러: 질의의 난이도를 동적으로 평가하고 학습 과정을 구성하여 RFT 과정에서 안정적인 학습을 보장합니다. 실험 결과, ReRec은 최첨단 기준 모델보다 우수한 성능을 보이며, 명령어 준수 및 일반 지식과 같은 핵심 능력을 유지합니다. 저희 코드는 https://github.com/jiani-huang/ReRec 에서 확인할 수 있습니다.

Original Abstract

With the rise of LLMs, there is an increasing need for intelligent recommendation assistants that can handle complex queries and provide personalized, reasoning-driven recommendations. LLM-based recommenders show potential but face challenges in multi-step reasoning, underscoring the need for reasoning-augmented systems. To address this gap, we propose ReRec, a novel reinforcement fine-tuning (RFT) framework designed to improve LLM reasoning in complex recommendation tasks. Our framework introduces three key components: (1) Dual-Graph Enhanced Reward Shaping, integrating recommendation metrics like NDCG@K with Query Alignment and Preference Alignment Scores to provide fine-grained reward signals for LLM optimization; (2) Reasoning-aware Advantage Estimation, which decomposes LLM outputs into reasoning segments and penalizes incorrect steps to enhance reasoning of recommendation; and (3) Online Curriculum Scheduler, dynamically assess query difficulty and organize training curriculum to ensure stable learning during RFT. Experiments demonstrate that ReRec outperforms state-of-the-art baselines and preserves core abilities like instruction-following and general knowledge. Our codes are available at https://github.com/jiani-huang/ReRec.

1 Citations

0 Influential

36.897207708399 Altmetric

185.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!