2601.05882v1 Jan 09, 2026 cs.CL

도메인 변화 하에서의 선호도 조정 일반화 및 다양성 연구: 실증적 분석

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

Constantinos F. Karouzos

Citations: 68

h-index: 2

Xingwei Tan

Citations: 29

h-index: 4

Nikolaos Aletras

Citations: 28

h-index: 3

선호도 조정은 사전 훈련된 언어 모델을 인간의 품질, 유용성 또는 안전성 판단에 맞게 조정하는 방법으로, likelihood(확률)만 사용하는 것이 아니라 명시적인 선호도 신호를 활용하여 최적화합니다. 기존 연구에서는 선호도 조정이 학습 도메인 외부에서 평가될 때 성능 저하와 유용성 감소를 초래한다는 것을 보여주었습니다. 그러나 이러한 도메인 변화에 대한 적응 전략이 얼마나 효과적인지는 아직 탐구되지 않았습니다. 본 연구에서는 도메인 변화 하에서 정렬 일반화에 대한 포괄적이고 체계적인 연구를 통해 이러한 문제를 해결합니다. 요약 및 질의 응답 작업에서 유용성을 평가하는 기준으로, 5가지 인기 있는 정렬 목표와 소스 도메인에서 타겟 도메인으로의 다양한 적응 전략(예: 타겟 도메인 지도 학습 미세 조정 및 유사 레이블링)을 비교합니다. 연구 결과는 도메인 변화 하에서 정렬 목표에 따른 일반화 능력의 체계적인 차이를 보여줍니다. 또한 유사 레이블링을 기반으로 한 적응 전략이 도메인 변화로 인한 성능 저하를 크게 줄일 수 있음을 확인했습니다.

Original Abstract

Preference tuning aligns pretrained language models to human judgments of quality, helpfulness, or safety by optimizing over explicit preference signals rather than likelihood alone. Prior work has shown that preference-tuning degrades performance and reduces helpfulness when evaluated outside the training domain. However, the extent to which adaptation strategies mitigate this domain shift remains unexplored. We address this challenge by conducting a comprehensive and systematic study of alignment generalization under domain shift. We compare five popular alignment objectives and various adaptation strategies from source to target, including target-domain supervised fine-tuning and pseudo-labeling, across summarization and question-answering helpfulness tasks. Our findings reveal systematic differences in generalization across alignment objectives under domain shift. We show that adaptation strategies based on pseudo-labeling can substantially reduce domain-shift degradation

1 Citations

0 Influential

1.5 Altmetric

8.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!