2601.07449v1 Jan 12, 2026 cs.IR

RLPO: 잔차 기반 목록 기반 선호도 최적화를 통한 장문 리뷰 순위 결정

RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking

Yichi Zhang

Citations: 2,930

h-index: 8

Hao Jiang

Citations: 25

h-index: 1

Zhi Yang

Citations: 69

h-index: 2

Annan Wang

Citations: 1,926

h-index: 12

Weisi Lin

Citations: 43

h-index: 3

리뷰 순위 결정은 전자상거래에서 방대한 사용자 생성 콘텐츠 속에서 진단적이고 신뢰할 수 있는 피드백을 우선적으로 보여주는 데 매우 중요합니다. 대규모 언어 모델은 의미론적 평가를 향상시켰지만, 기존의 순위 결정 방식은 장문 컨텍스트 환경에서 지속적인 성능 저하라는 어려움을 겪습니다. 포인트 기반 방식은 효율적이지만, 목록 수준의 상호작용을 고려하지 못하여 최적화된 상위 k개의 순위를 제공하지 못하는 경우가 많습니다. 목록 기반 방식은 전체적인 맥락을 활용할 수 있지만, 계산 비용이 많이 들고 후보 목록의 크기가 증가함에 따라 불안정해지는 경향이 있습니다. 이러한 문제를 해결하기 위해, 본 논문에서는 강력한 포인트 기반 언어 모델 점수판을 기반으로 목록 기반 표현 수준의 잔차 수정을 통해 순위를 결정하는 잔차 기반 목록 기반 선호도 최적화(RLPO)를 제안합니다. RLPO는 먼저 보정된 포인트 기반 점수와 항목 표현을 생성한 다음, 표현에 대한 가벼운 인코더를 사용하여 목록 기반 점수 잔차를 예측하여 전체 토큰 수준의 목록 기반 처리를 피합니다. 또한, 인간 검증을 통해 평가된 장문 리뷰 순위를 위한 대규모 벤치마크를 소개합니다. 실험 결과, RLPO는 강력한 포인트 기반 및 목록 기반 모델을 능가하는 NDCG@k 성능을 보이며, 목록 길이에 따른 안정성을 유지하는 것으로 나타났습니다.

Original Abstract

Review ranking is pivotal in e-commerce for prioritizing diagnostic and authentic feedback from the deluge of user-generated content. While large language models have improved semantic assessment, existing ranking paradigms face a persistent trade-off in long-context settings. Pointwise scoring is efficient but often fails to account for list-level interactions, leading to miscalibrated top-$k$ rankings. Listwise approaches can leverage global context, yet they are computationally expensive and become unstable as candidate lists grow. To address this, we propose Residual Listwise Preference Optimization (RLPO), which formulates ranking as listwise representation-level residual correction over a strong pointwise LLM scorer. RLPO first produces calibrated pointwise scores and item representations, then applies a lightweight encoder over the representations to predict listwise score residuals, avoiding full token-level listwise processing. We also introduce a large-scale benchmark for long-context review ranking with human verification. Experiments show RLPO improves NDCG@k over strong pointwise and listwise baselines and remains robust as list length increases.

1 Citations

0 Influential

6 Altmetric

31.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!