2606.10569v1 Jun 09, 2026 cs.CL

Hidden Consensus:Preference-Validity Compression in Human Feedback

Kather-ine Lee

Citations: 4,919

h-index: 8

Dorcas Chia Ern Chua

Citations: 0

h-index: 0

Chee Seng Chan

Citations: 36

h-index: 2

Jiaming Tan

Citations: 2

h-index: 1

Zhen Xue Gue

Citations: 0

h-index: 0

Norzalena Abdul Hamid

Citations: 0

h-index: 0

A. Azmi

Citations: 69

h-index: 4

Keat Mei Yeong

Citations: 0

h-index: 0

Aizat Izyani binti Mujab

Citations: 0

h-index: 0

Hafsa Azam

Citations: 27

h-index: 3

Cheemin Khoo

Citations: 8

h-index: 1

Hansol Lim

Citations: 12

h-index: 1

Standard RLHF pipelines often reduce heterogeneous human judgments into a single scalar reward target. We argue that this reduction can mis-measure alignment in structurally plural societies, where disagreement may reflect culturally, historically, linguistically, regionally, or normatively grounded interpretations rather than annotation noise. We call this failure Preference-Validity Compression, the collapse of multiple plural-valid response options into a single optimization target. Using Malaysia as a diagnostic setting, we analyze RLHF-style feedback aggregation through preference events linking prompts, responses, and acceptability judgments across interpretive frames. Across 321 preference events from 20 participants and 107 trio-annotated prompts, 79% of prompts contain more than one majority-supported response that single-winner aggregation would discard, and apparent dominance gaps between top responses diminish when all majority-supported options are considered. Participants frequently select multiple acceptable responses, and discarded responses demonstrably reflect coherent local, practical, or cultural frames. These findings show that majority aggregation in this corpus measures argmax acceptability rather than plural alignment. We treat this as a measurement-validity issue and argue that future alignment methods should satisfy Validity-Preserving Consistency, remaining stable across plural-valid interpretive frames rather than collapsing them into a single reward target.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!