2601.07153v1 Jan 12, 2026 cs.CL

대규모 언어 모델이 혼합 언어 텍스트를 이해하고, 추론하며, 생성할 수 있는가?

Can Large Language Models Understand, Reason About, and Generate Code-Switched Text?

David Anugraha

Citations: 447

h-index: 8

Haneul Yoo

Citations: 74

h-index: 4

G. Winata

Citations: 47

h-index: 3

Patrick Amadeus Irawan

MBZUAI

Citations: 98

h-index: 4

Anirban Das

Citations: 101

h-index: 4

Paresh Dashore

Citations: 3

h-index: 1

Shreyas Kulkarni

Citations: 3

h-index: 1

Ruochen Zhang

Brown University

Citations: 3,413

h-index: 11

Haruki Sakajo

Citations: 6

h-index: 2

Frederikus Hudi

Citations: 163

h-index: 5

Anaelia Ovalle

Citations: 584

h-index: 10

Syrielle Montariol

Citations: 236

h-index: 10

Félix Gaschi

Citations: 43

h-index: 3

M. Anugraha

Citations: 43

h-index: 2

Rutuj Ravindra Puranik

Citations: 0

h-index: 0

Zawad Hayat Ahmed

Citations: 0

h-index: 0

Adril Putra Merin

Citations: 0

h-index: 0

Emmanuele Chersoni

Citations: 1,115

h-index: 20

코드 스위칭은 다국어 커뮤니케이션에서 흔히 나타나는 현상이지만, 대규모 언어 모델(LLM)이 혼합 언어 환경에서 얼마나 효과적인지는 아직 충분히 연구되지 않았습니다. 본 연구에서는 LLM의 혼합 언어 텍스트 이해, 추론, 생성 능력을 종합적으로 평가합니다. 우리는 고품질의 인간 주석을 포함하는 새로운 벤치마크인 CodeMixQA를 소개합니다. 이 벤치마크는 16가지 다양한 병렬 코드 스위칭 언어 쌍 변형으로 구성되어 있으며, 다양한 지역과 코드 스위칭 패턴을 포괄하고, 원본 스크립트와 그 음역 형태를 모두 포함합니다. 이 벤치마크를 사용하여, 우리는 LLM의 혼합 언어 질의응답 작업에서의 추론 방식을 분석하고, 모델이 혼합 언어 입력을 어떻게 처리하고 추론하는지에 대한 통찰력을 제공합니다. 또한, LLM이 생성한 합성 혼합 언어 텍스트에 대한 체계적인 평가를 수행하여, 자연스러움과 의미적 정확성을 중점적으로 분석하고, 현재 생성 능력의 주요 한계를 밝혀냅니다. 우리의 연구 결과는 코드 스위칭 조건 하에서 추론 및 생성 능력의 지속적인 어려움을 보여주며, 보다 강력한 다국어 LLM을 구축하기 위한 실질적인 지침을 제공합니다. 데이터셋과 코드를 오픈 소스로 공개합니다.

Original Abstract

Code-switching is a pervasive phenomenon in multilingual communication, yet the robustness of large language models (LLMs) in mixed-language settings remains insufficiently understood. In this work, we present a comprehensive evaluation of LLM capabilities in understanding, reasoning over, and generating code-switched text. We introduce CodeMixQA a novel benchmark with high-quality human annotations, comprising 16 diverse parallel code-switched language-pair variants that span multiple geographic regions and code-switching patterns, and include both original scripts and their transliterated forms. Using this benchmark, we analyze the reasoning behavior of LLMs on code-switched question-answering tasks, shedding light on how models process and reason over mixed-language inputs. We further conduct a systematic evaluation of LLM-generated synthetic code-switched text, focusing on both naturalness and semantic fidelity, and uncover key limitations in current generation capabilities. Our findings reveal persistent challenges in both reasoning and generation under code-switching conditions and provide actionable insights for building more robust multilingual LLMs. We release the dataset and code as open source.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!