2604.04144v1 Apr 05, 2026 cs.CL

다양한 선호도, 제한된 정책: 확장 가능한 언어 모델 개인화 연구

Many Preferences, Few Policies: Towards Scalable Language Model Personalization

Milind Tambe

Citations: 329

h-index: 8

Cheol Woo Kim

Citations: 20

h-index: 3

Jai Moondra

Citations: 110

h-index: 5

Roozbeh Nahavandi

Citations: 0

h-index: 0

Andrew Perrault

Citations: 44

h-index: 4

Swati Gupta

Citations: 99

h-index: 5

언어 모델 개인화의 궁극적인 목표는 각 사용자의 선호도에 완벽하게 부합하는 단일 언어 모델을 제공하는 것입니다. 그러나, 사용자에 따라 별도의 언어 모델을 유지하는 것은 컴퓨팅 자원, 메모리, 그리고 시스템 복잡성 측면에서 비현실적입니다. 본 연구에서는, 이 문제를 해결하기 위해, 다양한 사용자의 대표적인 특성을 포착하는 작은 규모의 언어 모델 포트폴리오를 선택하는 체계적인 방법을 개발했습니다. 우리는 안전성, 유머, 간결성 등 다양한 특성에 대한 사용자 선호도를 다차원 가중치 벡터로 모델링합니다. 이러한 차원에서의 보상 함수를 기반으로, 저희의 알고리즘인 PALM (Portfolio of Aligned LLMs)은, 주어진 어떤 가중치 벡터에 대해서도, 해당 스칼라화된 목표에 대해 거의 최적의 성능을 보이는 언어 모델 포트폴리오를 생성합니다. 저희가 알고 있는 한, 본 연구는 개인화를 위한 언어 모델 포트폴리오의 크기와 근사 품질에 대한 이론적 보장을 제공하는 최초의 결과입니다. 또한, 시스템 비용과 개인화 수준 사이의 균형, 그리고 사용자 선호도의 범위를 포괄하기 위해 필요한 언어 모델의 다양성을 분석합니다. 저희는 이러한 보장을 검증하고, 기존 방법보다 더 다양한 출력 결과를 얻을 수 있음을 보여주는 실험 결과를 제시합니다.

Original Abstract

The holy grail of LLM personalization is a single LLM for each user, perfectly aligned with that user's preferences. However, maintaining a separate LLM per user is impractical due to constraints on compute, memory, and system complexity. We address this challenge by developing a principled method for selecting a small portfolio of LLMs that captures representative behaviors across heterogeneous users. We model user preferences across multiple traits (e.g., safety, humor, brevity) through a multi-dimensional weight vector. Given reward functions across these dimensions, our algorithm PALM (Portfolio of Aligned LLMs) generates a small portfolio of LLMs such that, for any weight vector, the portfolio contains a near-optimal LLM for the corresponding scalarized objective. To the best of our knowledge, this is the first result that provides theoretical guarantees on both the size and approximation quality of LLM portfolios for personalization. It characterizes the trade-off between system cost and personalization, as well as the diversity of LLMs required to cover the landscape of user preferences. We provide empirical results that validate these guarantees and demonstrate greater output diversity over common baselines.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!