2603.20939v1 Mar 21, 2026 cs.CL

대화형 LLM 에이전트를 위한 사용자 선호도 모델링: 검색 증강 상호 작용을 통한 약한 보상

User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

Shuhaib Mehri

Citations: 50

h-index: 4

Dilek Hakkani-Tur

Citations: 574

h-index: 11

Yuren Hao

Citations: 4

h-index: 1

C. Zhai

Citations: 17

h-index: 2

대규모 언어 모델은 점점 개인 비서로 사용되고 있지만, 대부분의 모델은 지속적인 사용자 모델을 갖추지 못하여 사용자가 세션마다 선호도를 반복적으로 명시해야 합니다. 본 연구에서는 Vector-Adapted Retrieval Scoring (VARS)라는 파이프라인에 독립적이고 고정된 백본 구조를 제안합니다. VARS는 각 사용자를 공유된 선호도 공간에서 장기 및 단기 벡터로 표현하고, 이러한 벡터를 사용하여 구조화된 선호도 메모리에 대한 검색 점수를 조정합니다. 이러한 벡터는 사용자의 피드백으로부터 얻은 약한 스칼라 보상을 기반으로 온라인으로 업데이트되어, 사용자별 미세 조정 없이 개인화를 가능하게 합니다. 본 연구는 풍부한 사용자 선호도 프로필을 갖춘 온라인 멀티 세션 협업 벤치마크인 extsc{MultiSessionCollab} 데이터셋을 사용하여 수학 및 코딩 작업에서 VARS의 성능을 평가했습니다. 고정된 백본 구조 하에서, 사용자 인지 검색의 주요 이점은 원시 작업 정확도 향상보다는 상호 작용 효율성 향상입니다. VARS 에이전트는 전체적으로 가장 뛰어난 성능을 보이며, 강력한 Reflection 기준 모델과 동등한 작업 성공률을 달성하고, 타임아웃 비율과 사용자 노력을 줄입니다. 학습된 장기 벡터는 사용자 간 선호도 중복과 일치하는 반면, 단기 벡터는 세션별 적응을 반영하여 이중 벡터 설계의 해석 가능성을 뒷받침합니다. 코드, 모델 및 데이터는 https://github.com/YurenHao0426/VARS 에서 확인할 수 있습니다.

Original Abstract

Large language models are increasingly used as personal assistants, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. We propose Vector-Adapted Retrieval Scoring (VARS), a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retrieval scoring over structured preference memory. The vectors are updated online from weak scalar rewards from users' feedback, enabling personalization without per-user fine-tuning. We evaluate on \textsc{MultiSessionCollab}, an online multi-session collaboration benchmark with rich user preference profiles, across math and code tasks. Under frozen backbones, the main benefit of user-aware retrieval is improved interaction efficiency rather than large gains in raw task accuracy: our full VARS agent achieves the strongest overall performance, matches a strong Reflection baseline in task success, and reduces timeout rate and user effort. The learned long-term vectors also align with cross-user preference overlap, while short-term vectors capture session-specific adaptation, supporting the interpretability of the dual-vector design. Code, model, and data are available at https://github.com/YurenHao0426/VARS.

0 Citations

0 Influential

28.9657359028 Altmetric

144.8 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!