2605.29685v1 May 28, 2026 cs.AI

NICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMs

Zaifeng Gao
Zaifeng Gao
Citations: 58
h-index: 5
Yixuan Wang
Yixuan Wang
Citations: 12
h-index: 2
Yunjin Qi
Yunjin Qi
Citations: 0
h-index: 0
Zhaojun Jiang
Zhaojun Jiang
Citations: 105
h-index: 5
Hanxi Pan
Hanxi Pan
Citations: 24
h-index: 3
Xiangting Ji
Xiangting Ji
Citations: 11
h-index: 2
Churu Yu
Churu Yu
Citations: 0
h-index: 0
Chunyuan Zheng
Chunyuan Zheng
Citations: 15
h-index: 2
Yingze Chen
Yingze Chen
Citations: 5
h-index: 1
Jie He
Jie He
Citations: 87
h-index: 4
Liuqing Chen
Liuqing Chen
Citations: 17
h-index: 2
Xuan Wu
Xuan Wu
Citations: 8
h-index: 2
Yanfang Liu
Yanfang Liu
Citations: 0
h-index: 0

As large language models (LLMs) are increasingly applied in social contexts such as emotional companionship and customer service, measuring their social intelligence has become critical to the quality and safety of human-AI interaction. However, existing social intelligence benchmarks lack a unified framework that organizes social abilities into a unified structure, and therefore cannot enable fine-grained diagnosis. To build the first holistic diagnostic evaluation grounded in social theory, we first construct a social intelligence framework through a literature review and multi-stage expert validation guided by psychometric principles. The resulting framework includes 4 categories and 11 dimensions, each further specified by fine-grained capability facets. Building on this framework, we introduce NICE (Norm, Interaction, Cognition, Experience), a diagnostic benchmark of 137 items operationalized through representative Chinese contexts. Across 5 frontier LLMs and a human reference group, models score higher in aggregate accuracy yet show a consistent weakness in Communication, which the framework localizes to 3 specific capability facets: multi-turn communication, nonverbal communication, and synchrony. NICE thus reframes social intelligence evaluation toward theory-grounded diagnosis of socially consequential weaknesses in LLMs.

0 Citations
0 Influential
2.5 Altmetric
12.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!