2604.22542v1 Apr 24, 2026 cs.CL

제어 가능한 음성 대화 생성: LLM 기반의 K-12 비영어권 학습자를 위한 평가 시스템

Controllable Spoken Dialogue Generation: An LLM-Driven Grading System for K-12 Non-Native English Learners

Haokun Zhao

Citations: 34

h-index: 3

Haidong Yuan

Citations: 0

h-index: 0

Songjun Cao

Citations: 288

h-index: 9

Wanshi Xu

Citations: 64

h-index: 5

Qingyu Zhou

Harbin Institute of Technology

Citations: 1,911

h-index: 21

Long Ma

Citations: 27

h-index: 2

Hongjie Fan

Citations: 43

h-index: 1

대규모 언어 모델(LLM)은 종종 비영어권 환경의 K-12 영어 학습자의 교육적 요구를 충족시키지 못하는 경향이 있는데, 이는 학습자의 수준과 모델의 성능 간의 불일치 때문입니다. 이러한 광범위한 문제를 해결하기 위해, 본 연구에서는 중국의 국가 교육 과정(CSE)을 대표적인 사례로 사용하여, LLM의 출력 결과를 학습자의 능력에 맞게 조정하는, 수준에 맞는 프레임워크를 제안합니다. 본 프레임워크는 4단계의 평가 시스템을 통해 어휘 복잡성을 정밀하게 제어하며, 이를 지원하기 위해 새로운 자원인 수준별 어휘 목록 및 다중 턴 대화 코퍼스를 제공합니다. 본 연구의 핵심 기술적 기여는 **DDPO(Diversity Driven Policy Optimization)** 알고리즘이며, 이는 다중 턴 대화의 다양성을 유지하면서 전체적인 대화 품질을 최적화하는 GRPO 기반의 접근 방식입니다. 이 방법은 기존의 접근 방식보다 훨씬 우수한 성능을 보이며, 낮은 외래어 비율과 높은 다양성을 달성하는 동시에 대화의 자연스러움과 교육적 가치를 향상시킵니다. 본 프레임워크는 CSE에 기반하지만, 유연성을 고려하여 설계되었으며, 다른 교육 표준에도 쉽게 적용할 수 있습니다. 본 연구에서 개발한 모델, 데이터 및 코드는 모두 공개되어, 비임mersive 환경에서 K-12 학습자들이 겪는 고유한 어려움을 효과적으로 해결할 수 있는 개인 맞춤형 영어 회화 연습 플랫폼을 제공할 것입니다.

Original Abstract

Large language models (LLMs) often fail to meet the pedagogical needs of K-12 English learners in non-native contexts due to a proficiency mismatch. To address this widespread challenge, we introduce a proficiency-aligned framework that adapts LLM outputs to learner abilities, using China's national curriculum (CSE) as a representative case. Our framework enables precise control over lexical complexity through a four-tier grading system, supported by a comprehensive suite of new resources: graded vocabulary lists and a multi-turn dialogue corpus. Our core technical contribution is the \textbf{DDPO} algorithm,Diversity Driven Policy Optimization, a multi-turn GRPO-based approach designed to preserve dialogue diversity while holistically optimizing dialogue quality. This method significantly outperforms conventional approaches, achieving low out-of-vocabulary rates and high diversity while enhancing conversational naturalness and pedagogical value. While grounded in the CSE, our framework is designed for flexibility and can be readily adapted to other educational standards. Our models, data, and code will all be open-sourced, providing a scalable platform for personalized English speaking practice that effectively addresses the unique challenges faced by K-12 learners in non-immersive environments.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!