2601.14063v1 Jan 20, 2026 cs.CL

XCR-Bench: LLM의 문화적 추론 능력을 평가하기 위한 다중 작업 벤치마크

XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs

Mohsinul Kabir

Citations: 392

h-index: 8

T. Ahmed

Citations: 8

h-index: 2

Md Mezbaur Rahman

Citations: 7

h-index: 2

Hassan Alhuzali

Citations: 154

h-index: 5

Sophia Ananiadou

Citations: 52

h-index: 3

Jimin Huang

Citations: 249

h-index: 8

Shaoxiong Ji

Citations: 59

h-index: 4

Yuechen Jiang

Citations: 497

h-index: 7

거대언어모델(LLM)의 교차 문화 역량은 문화 특수 항목(CSI)을 식별하고 이를 다양한 문화적 맥락에 맞게 적절히 변환하는 능력을 필요로 합니다. 이러한 능력을 평가하는 데 있어 진전은 교차 문화 병렬 문장 쌍과 고품질의 CSI 주석이 달린 코퍼스가 부족하여 제약을 받아왔습니다. 이러한 한계를 해결하기 위해 우리는 4,900개의 병렬 문장과 1,098개의 고유 CSI로 구성된 교차(X)-문화 추론 벤치마크인 XCR-Bench를 소개하며, 이는 세 가지 별도의 추론 작업과 그에 상응하는 평가 지표를 아우릅니다. 우리의 코퍼스는 Newmark의 CSI 프레임워크와 Hall의 문화 3요소(Triad of Culture)를 통합하여, 표면적인 요소를 넘어 사회적 규범, 신념, 가치관과 같은 반가시적 및 비가시적 문화 요소에 이르는 문화적 추론의 체계적인 분석을 가능하게 합니다. 연구 결과, 최첨단 LLM들은 사회적 예절 및 문화적 참조와 관련된 CSI를 식별하고 변환하는 데 있어 일관된 약점을 보이는 것으로 나타났습니다. 또한, 우리는 LLM이 문화적 변환 과정에서 단일 언어 환경 내에서도 지역적 및 민족-종교적 편향을 내재하고 있다는 증거를 발견했습니다. 우리는 교차 문화 NLP에 대한 향후 연구를 촉진하기 위해 코퍼스와 코드를 공개합니다.

Original Abstract

Cross-cultural competence in large language models (LLMs) requires the ability to identify Culture-Specific Items (CSIs) and to adapt them appropriately across cultural contexts. Progress in evaluating this capability has been constrained by the scarcity of high-quality CSI-annotated corpora with parallel cross-cultural sentence pairs. To address this limitation, we introduce XCR-Bench, a Cross(X)-Cultural Reasoning Benchmark consisting of 4.9k parallel sentences and 1,098 unique CSIs, spanning three distinct reasoning tasks with corresponding evaluation metrics. Our corpus integrates Newmark's CSI framework with Hall's Triad of Culture, enabling systematic analysis of cultural reasoning beyond surface-level artifacts and into semi-visible and invisible cultural elements such as social norms, beliefs, and values. Our findings show that state-of-the-art LLMs exhibit consistent weaknesses in identifying and adapting CSIs related to social etiquette and cultural reference. Additionally, we find evidence that LLMs encode regional and ethno-religious biases even within a single linguistic setting during cultural adaptation. We release our corpus and code to facilitate future research on cross-cultural NLP.

2 Citations

0 Influential

4 Altmetric

22.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!