2604.22367v1 Apr 24, 2026 cs.CL

CNSL-bench: 중국 국가 수어에 대한 MLLM의 수어 이해 능력 벤치마킹

CNSL-bench: Benchmarking the Sign Language Understanding Capabilities of MLLMs on Chinese National Sign Language

Rui Zhao

Citations: 50

h-index: 3

Xu-Xiang Zhong

Citations: 7

h-index: 2

Jinsong Su

Citations: 20

h-index: 2

Yidong Chen

Citations: 116

h-index: 5

Xiaoyun Zheng

Citations: 0

h-index: 0

대규모 언어 모델(LLM)의 발전으로 인해 수어 연구는 상당한 진전을 이루었습니다. 그러나 LLM의 수어 이해 능력, 특히 다중 모드 환경에서의 능력은 아직 충분히 연구되지 않았습니다. 이러한 한계를 해결하기 위해, 다중 모드 대규모 언어 모델(MLLM)의 수어 이해 능력을 평가하기 위한 최초의 종합적인 중국 국가 수어 벤치마크인 CNSL-bench를 소개합니다. 제안된 CNSL-bench는 다음과 같은 특징을 가집니다. 1) 권위 있는 기반: 공식적으로 표준화된 '국가 공통 수어 사전'에 기반하여, 지역 또는 비표준 변형으로 인한 모호성을 줄이고 일관된 의미 정의를 보장합니다. 2) 다중 모드 지원: 정렬된 텍스트 설명, 예시 이미지, 수어 동영상을 제공합니다. 3) 조형 다양성: 에어 라이팅, 손글씨, 중국 수화 문자 등 핵심적인 수어 조형 방식을 지원하여 세밀한 분석이 가능합니다. CNSL-bench를 사용하여 21개의 최신 오픈 소스 및 독점 MLLM을 광범위하게 평가했습니다. 그 결과, 다중 모드 모델링의 최근 발전에도 불구하고, 현재 MLLM은 여전히 인간의 성능에 크게 미치지 못하며, 입력 모드 및 수어 조형 방식에 따라 체계적인 차이를 보입니다. 추가적인 분석 결과, 추론 능력 향상 외에도 여러 성능 제한 사항이 여전히 존재하며, 모델에 따라 지시 사항 준수 능력에 상당한 차이가 있음을 알 수 있습니다.

Original Abstract

Sign language research has achieved significant progress due to the advances in large language models (LLMs). However, the intrinsic ability of LLMs to understand sign language, especially in multimodal contexts, remains underexplored. To address this limitation, we introduce CNSL-bench, the first comprehensive Chinese em{National Sign Language benchmark designed for evaluating multimodal large language models (MLLMs) in sign language understanding. The proposed CNSL-bench is characterized by: 1) Authoritative grounding, as it is anchored to the officially standardized \textit{National Common Sign Language Dictionary, mitigating ambiguity from regional or non-canonical variants and ensuring consistent semantic definitions; 2) Multimodal coverage, providing aligned textual descriptions, illustrative images, and sign language videos; and 3) Articulatory diversity, supporting fine-grained analysis across key manual articulatory forms, including air-writing, finger-spelling, and the Chinese manual-alphabet. Using CNSL-bench, we extensively evaluate 21 open-source and proprietary up-to-date MLLMs. Our results reveal that, despite recent advances in multimodal modeling, current MLLMs remain substantially inferior to human performance, exhibiting systematic disparities across input modalities and manual articulatory forms. Additional diagnostic analyses suggest that several performance limitations persist beyond improvements in reasoning and that instruction-following robustness varies substantially across models.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!