2603.04191v1 Mar 04, 2026 cs.AI

현실적인 개인화 지향: 개인화된 사용자-LLM 상호작용에서 장기적인 선호도 추종 평가

Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions

Yibo Li

Citations: 123

h-index: 6

Bryan Hooi

Citations: 1,219

h-index: 17

Yue Liu

Citations: 172

h-index: 5

Qianyu Guo

Citations: 3,500

h-index: 5

대규모 언어 모델(LLM)은 점점 더 개인 비서로서 사용되고 있으며, 사용자들은 광범위하고 다양한 선호도를 장기간에 걸쳐 공유합니다. 그러나 LLM이 이러한 선호도를 얼마나 잘 따르는지, 특히 현실적이고 장기적인 상황에서 평가하는 연구는 아직 미흡합니다. 본 연구에서는 개인화된 사용자-LLM 상호작용에서 현실적인 선호도 추종을 평가하기 위한 벤치마크인 RealPref를 제안합니다. RealPref는 100개의 사용자 프로필, 1300개의 개인화된 선호도, 명시적인 표현부터 암묵적인 표현까지 4가지 유형의 선호도 표현 방식, 그리고 장기적인 상호작용 기록을 포함합니다. 또한, 객관식, 참/거짓, 자유형 등 3가지 유형의 테스트 질문과, LLM 평가를 위한 상세한 평가 기준을 제공합니다. 실험 결과, LLM의 성능은 문맥의 길이가 길어지고 선호도 표현이 더욱 암묵적으로 변할수록 현저하게 감소하며, 사용자 선호도 이해를 새로운 시나리오로 일반화하는 데 어려움이 있음을 보여줍니다. RealPref와 이러한 연구 결과는 향후 개인의 요구에 더 잘 적응하는 사용자 중심의 LLM 비서 개발을 위한 기반을 제공합니다. 관련 코드는 https://github.com/GG14127/RealPref 에서 확인할 수 있습니다.

Original Abstract

Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions. However, assessing how well LLMs can follow these preferences in realistic, long-term situations remains underexplored. This work proposes RealPref, a benchmark for evaluating realistic preference-following in personalized user-LLM interactions. RealPref features 100 user profiles, 1300 personalized preferences, four types of preference expression (ranging from explicit to implicit), and long-horizon interaction histories. It includes three types of test questions (multiple-choice, true-or-false, and open-ended), with detailed rubrics for LLM-as-a-judge evaluation. Results indicate that LLM performance significantly drops as context length grows and preference expression becomes more implicit, and that generalizing user preference understanding to unseen scenarios poses further challenges. RealPref and these findings provide a foundation for future research to develop user-aware LLM assistants that better adapt to individual needs. The code is available at https://github.com/GG14127/RealPref.

3 Citations

1 Influential

37.45879734614 Altmetric

192.3 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!