2602.24040v1 Feb 27, 2026 cs.LG

RewardUQ: 불확실성을 고려한 통합 보상 모델 프레임워크

RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models

Samuel Stante

Citations: 1

h-index: 1

Florian Redhardt

Citations: 9

h-index: 2

Lena Libon

Citations: 1

h-index: 1

Parnian Kassraie

Citations: 143

h-index: 7

Ido Hakimi

Citations: 300

h-index: 10

Barna P'asztor

Citations: 21

h-index: 3

Andreas Krause

Citations: 9

h-index: 2

Daniel Yang

Citations: 31

h-index: 2

보상 모델은 대규모 언어 모델(LLM)을 인간의 선호도에 맞추는 데 핵심적인 역할을 합니다. 그러나 대부분의 접근 방식은 제한된 인간 피드백으로 인해 발생하는 보상 모델의 불확실성을 간과하는 지점별 보상 추정치를 사용합니다. 최근 연구에 따르면 이러한 불확실성을 정량화하면 불확실성을 기반으로 한 능동 학습을 통해 인간 어노테이션 비용을 줄이고, LLM의 후속 훈련 과정에서 보상 과최적화를 완화할 수 있습니다. 그러나 불확실성을 고려한 보상 모델은 아직 체계적인 비교 없이 사용되어 왔으며, 이에 대한 이해가 부족합니다. 본 연구에서는 보상 모델의 불확실성 정량화를 체계적으로 평가하기 위한 통합 프레임워크인 RewardUQ를 소개합니다. 우리는 표준적인 정확도 및 보정 지표를 사용하여 일반적인 방법을 비교하고, 비교를 단순화하기 위해 두 가지 차원을 모두 포함하는 새로운 순위 전략을 제안합니다. 실험 결과, 모델 크기와 초기화가 성능에 가장 큰 영향을 미치는 것으로 나타났으며, 대부분의 기존 연구는 다른 설계 선택을 통해 개선될 수 있었습니다. 새로운 방법의 개발 및 평가를 촉진하고, 하위 응용 분야에의 배포를 지원하기 위해, 본 연구에서 개발한 오픈 소스 프레임워크를 Python 패키지로 공개합니다. 저희 코드는 https://github.com/lasgroup/rewarduq 에서 확인할 수 있습니다.

Original Abstract

Reward models are central to aligning large language models (LLMs) with human preferences. Yet most approaches rely on pointwise reward estimates that overlook the epistemic uncertainty in reward models arising from limited human feedback. Recent work suggests that quantifying this uncertainty can reduce the costs of human annotation via uncertainty-guided active learning and mitigate reward overoptimization in LLM post-training. However, uncertainty-aware reward models have so far been adopted without thorough comparison, leaving them poorly understood. This work introduces a unified framework, RewardUQ, to systematically evaluate uncertainty quantification for reward models. We compare common methods along standard metrics measuring accuracy and calibration, and we propose a new ranking strategy incorporating both dimensions for a simplified comparison. Our experimental results suggest that model size and initialization have the most meaningful impact on performance, and most prior work could have benefited from alternative design choices. To foster the development and evaluation of new methods and aid the deployment in downstream applications, we release our open-source framework as a Python package. Our code is available at https://github.com/lasgroup/rewarduq.

1 Citations

0 Influential

38.862943611199 Altmetric

195.3 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!