2604.04723v1 Apr 06, 2026 cs.CL

제2외국어로서의 영어 사용과 오타가 LLM 성능에 미치는 개별적 및 복합적 영향

Individual and Combined Effects of English as a Second Language and Typos on LLM Performance

Minda Zhao

Citations: 14

h-index: 2

Mengyu Wang

Citations: 15

h-index: 2

Weixuan Dong

Citations: 16

h-index: 2

Serena Liu

Citations: 1,844

h-index: 4

Prisha Sheth

Citations: 0

h-index: 0

Mi Diao

Citations: 12

h-index: 2

Nikhil Banga

Citations: 0

h-index: 0

Oscar Melendez

Citations: 0

h-index: 0

Arnav Sharma

Citations: 53

h-index: 2

Marina Lin

Citations: 0

h-index: 0

Yutong Yang

Citations: 0

h-index: 0

Xinru Zhu

Citations: 7

h-index: 2

대규모 언어 모델(LLM)은 전 세계적으로 사용되고 있으며, 훈련 데이터의 상당 부분이 영어로 구성되어 있기 때문에 일반적으로 영어 입력에 대해 최상의 성능을 보입니다. 그 결과, 많은 비영어권 사용자들이 제2외국어로서의 영어(ESL)를 사용하여 LLM과 상호 작용하며, 이러한 입력에는 종종 오타가 포함됩니다. 기존 연구에서는 ESL의 다양한 표현과 오타가 미치는 영향을 주로 개별적으로 연구해 왔지만, 실제 사용 환경에서는 이 두 가지 요소가 종종 함께 나타납니다. 본 연구에서는 Trans-EnV 프레임워크를 사용하여 표준 영어 입력을 8가지 ESL 변형으로 변환하고, MulTypo를 사용하여 오타를 3가지 수준(낮음, 중간, 높음)으로 추가했습니다. 연구 결과, ESL 변형과 오타를 결합하면 개별적인 요인만 있을 때보다 성능 저하가 더 크게 나타나는 경향이 있으며, 이러한 복합 효과는 단순히 가산적으로 작용하지 않습니다. 이러한 경향은 폐쇄형 작업에서 가장 명확하게 나타나며, 여기서 성능 저하는 ESL 변형과 오타 수준에 따라 보다 일관되게 특징지어질 수 있습니다. 반면, 개방형 작업에 대한 결과는 더 복잡합니다. 전반적으로, 이러한 결과는 표준 영어 환경에서의 평가가 실제 모델 성능을 과대평가할 수 있으며, ESL 변형과 오타를 개별적으로 평가하는 것만으로는 모델의 실제 동작을 충분히 반영하지 못할 수 있음을 시사합니다.

Original Abstract

Large language models (LLMs) are used globally, and because much of their training data is in English, they typically perform best on English inputs. As a result, many non-native English speakers interact with them in English as a second language (ESL), and these inputs often contain typographical errors. Prior work has largely studied the effects of ESL variation and typographical errors separately, even though they often co-occur in real-world use. In this study, we use the Trans-EnV framework to transform standard English inputs into eight ESL variants and apply MulTypo to inject typos at three levels: low, moderate, and severe. We find that combining ESL variation and typos generally leads to larger performance drops than either factor alone, though the combined effect is not simply additive. This pattern is clearest on closed-ended tasks, where performance degradation can be characterized more consistently across ESL variants and typo levels, while results on open-ended tasks are more mixed. Overall, these findings suggest that evaluations on clean standard English may overestimate real-world model performance, and that evaluating ESL variation and typographical errors in isolation does not fully capture model behavior in realistic settings.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!