2606.10569v1 Jun 09, 2026 cs.CL

Hidden Consensus:Preference-Validity Compression in Human Feedback

Kather-ine Lee
Kather-ine Lee
Citations: 4,919
h-index: 8
Dorcas Chia Ern Chua
Dorcas Chia Ern Chua
Citations: 0
h-index: 0
Chee Seng Chan
Chee Seng Chan
Citations: 36
h-index: 2
Jiaming Tan
Jiaming Tan
Citations: 2
h-index: 1
Zhen Xue Gue
Zhen Xue Gue
Citations: 0
h-index: 0
Norzalena Abdul Hamid
Norzalena Abdul Hamid
Citations: 0
h-index: 0
A. Azmi
A. Azmi
Citations: 69
h-index: 4
Keat Mei Yeong
Keat Mei Yeong
Citations: 0
h-index: 0
Aizat Izyani binti Mujab
Aizat Izyani binti Mujab
Citations: 0
h-index: 0
Hafsa Azam
Hafsa Azam
Citations: 27
h-index: 3
Cheemin Khoo
Cheemin Khoo
Citations: 8
h-index: 1
Hansol Lim
Hansol Lim
Citations: 12
h-index: 1

Standard RLHF pipelines often reduce heterogeneous human judgments into a single scalar reward target. We argue that this reduction can mis-measure alignment in structurally plural societies, where disagreement may reflect culturally, historically, linguistically, regionally, or normatively grounded interpretations rather than annotation noise. We call this failure Preference-Validity Compression, the collapse of multiple plural-valid response options into a single optimization target. Using Malaysia as a diagnostic setting, we analyze RLHF-style feedback aggregation through preference events linking prompts, responses, and acceptability judgments across interpretive frames. Across 321 preference events from 20 participants and 107 trio-annotated prompts, 79% of prompts contain more than one majority-supported response that single-winner aggregation would discard, and apparent dominance gaps between top responses diminish when all majority-supported options are considered. Participants frequently select multiple acceptable responses, and discarded responses demonstrably reflect coherent local, practical, or cultural frames. These findings show that majority aggregation in this corpus measures argmax acceptability rather than plural alignment. We treat this as a measurement-validity issue and argue that future alignment methods should satisfy Validity-Preserving Consistency, remaining stable across plural-valid interpretive frames rather than collapsing them into a single reward target.

0 Citations
0 Influential
4 Altmetric
20.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!