2604.10720v1 Apr 12, 2026 cs.AI

학습자가 코딩하는 방식을 언어 모델에 가르치는 방법: 학생 시뮬레이션을 위한 대화형 직렬화

Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation

Charles Koutcheme

Citations: 389

h-index: 8

Arto Hellas

Citations: 4,068

h-index: 28

Juho Leinonen

Citations: 5,288

h-index: 34

교육 시스템 내에서 학습자의 행동과 반응을 시뮬레이션하는 인공 모델은 튜터링 전략 및 피드백 메커니즘을 대규모로 평가하는 데 유망한 도구입니다. 그러나 프로그래밍 교육의 많은 기존 접근 방식은 대규모의 독점적인 언어 모델에 의존하며, 이는 개인 정보 보호, 비용 및 의존성 문제를 야기합니다. 본 연구에서는 실제 학생 과정 데이터를 사용하여 개방형 가중치 인공 프로그래밍 학습자를 훈련하는 방법을 제안합니다. 우리 접근 방식은 시간 순서의 로그 데이터를 대화 형식으로 직렬화하여, 각 학생의 문제 해결 과정을 학습자와 자동 평가 시스템 간의 대화로 표현합니다. 학생의 코드 제출물과 테스트 결과, 성적, 오류 추적과 같은 환경 피드백이 번갈아 가며 대화의 턴을 구성하여, 모델이 반복적인 디버깅 과정을 통해 학습할 수 있도록 합니다. 또한, 우리는 모델을 실제 학생의 디버깅 행동과 일치시키기 위해 지도 학습 미세 조정과 선호도 최적화를 결합한 훈련 파이프라인을 도입했습니다. 우리는 4B 및 8B 규모의 Qwen 모델을 실제 학생이 Python 프로그래밍 과제에 제출한 대규모 데이터 세트를 사용하여 훈련하여 우리 프레임워크를 평가했습니다. 결과는 환경 피드백을 통합하면 모델이 학생의 디버깅 행동을 모방하는 능력을 강화하여 기능적 정렬 및 코드 유사성 측면에서 기존의 코드 기반 접근 방식 및 프롬프트 기반 대규모 언어 모델 기준을 능가한다는 것을 보여줍니다. 우리는 재현성을 지원하기 위해 코드를 공개합니다.

Original Abstract

Artificial models that simulate how learners act and respond within educational systems are a promising tool for evaluating tutoring strategies and feedback mechanisms at scale. However, many existing approaches in programming education rely on prompting large, proprietary language models, raising concerns around privacy, cost, and dependence. In this work, we propose a method for training open-weight artificial programming learners using authentic student process data. Our approach serializes temporal log traces into a conversational format, representing each student's problem-solving process as a dialogue between the learner and their automated assessment system. Student code submissions and environment feedback, such as test outcomes, grades, and error traces, form alternating conversational turns, enabling models to learn from the iterative debugging process. We additionally introduce a training pipeline combining supervised fine-tuning with preference optimization to align models with authentic student debugging behavior. We evaluate our framework by training Qwen models at 4B and 8B scales on a large-scale dataset of real student submissions to Python programming assignments. Our results show that incorporating environment feedback strengthens the models' ability to replicate student debugging behavior, improving over both prior code-only approaches and prompted large language models baselines in functional alignment and code similarity. We release our code to support reproducibility.

1 Citations

0 Influential

17 Altmetric

86.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!