2605.27168v1 May 26, 2026 cs.CL

Grounding Text Embeddings in Stakeholder Associations

Zihao Fu
Zihao Fu
Citations: 27
h-index: 3
Chris Russell
Chris Russell
Citations: 25
h-index: 3
Kenneth C. Enevoldsen
Kenneth C. Enevoldsen
Citations: 243
h-index: 5
Jonathan Rystrøm
Jonathan Rystrøm
Citations: 242
h-index: 5
Sofie Burgos-Thorsen
Sofie Burgos-Thorsen
Citations: 30
h-index: 3
Johan Irving Søltoft
Johan Irving Søltoft
Citations: 4
h-index: 2

Text embeddings are widely used to analyse large corpora of complex texts. However, it is unclear whether the embeddings capture the same semantic distances as the human experts using them. Ensuring alignment between embedding representations and human intentions is essential for valid analyses. We present the Stakeholder Grounding Exercise, a method for making expert associations explicit and grounding embedding model results in human understanding. In our primary case study on Danish policy issues, we find that neural text embeddings are substantially less reliable than human experts (19-26 pp gap), and that this misalignment propagates to downstream clustering performance (Spearman $ρ=0.9$ between exercise ranking and cluster quality). A secondary study on US Federal AI use cases replicates the gap (16pp) in English, using a digital protocol and a different community of experts -- demonstrating that the gap is not an artefact of a single instrument or domain. The Stakeholder Grounding Exercise offers a practical method for assessing whether embedding models capture the semantic distinctions that matter most to domain experts.

0 Citations
0 Influential
2.5 Altmetric
12.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!