2601.17006v1 Jan 14, 2026 cs.LG

MathMixup: 난이도 조절이 가능한 데이터 생성 및 교육 과정 학습을 통한 LLM의 수학적 추론 능력 향상

MathMixup: Boosting LLM Mathematical Reasoning with Difficulty-Controllable Data Synthesis and Curriculum Learning

Jing Chen

Citations: 3,781

h-index: 6

Xuchen Li

Citations: 6

h-index: 1

Xuzhao Li

Citations: 89

h-index: 5

Hao Liang

Citations: 110

h-index: 4

Xiaohuan Zhou

Citations: 94

h-index: 3

Taifeng Wang

Citations: 1,505

h-index: 12

Wentao Zhang

Citations: 38

h-index: 3

대규모 언어 모델(LLM)의 수학적 추론 능력 발전은 명확하게 정의되고 잘 구성된 난이도 레벨을 가진 고품질 학습 데이터에 크게 의존합니다. 그러나 기존의 데이터 생성 방법은 종종 다양성이 부족하고 문제 난이도에 대한 정밀한 제어가 어려워, 교육 과정 학습과 같은 효율적인 학습 패러다임을 지원하기에 충분하지 않습니다. 이러한 문제점을 해결하기 위해, 우리는 MathMixup이라는 새로운 데이터 생성 패러다임을 제안합니다. MathMixup은 하이브리드 및 분해 전략을 사용하여 체계적으로 고품질의 난이도 조절이 가능한 수학적 추론 문제를 생성합니다. 자동화된 자체 검사 및 수동 검토를 통해 생성된 데이터의 의미적 명확성과 잘 구조화된 난이도 그라디언트를 보장합니다. 이를 바탕으로 MathMixupQA 데이터셋을 구축하고, 이러한 난이도별 문제들을 활용하는 교육 과정 학습 전략을 설계하여 다른 데이터셋과의 유연한 통합을 지원합니다. 실험 결과, MathMixup과 그에 따른 교육 과정 학습 전략은 LLM의 수학적 추론 성능을 크게 향상시키는 것으로 나타났습니다. 파인튜닝된 Qwen2.5-7B 모델은 7개의 수학적 벤치마크에서 평균 52.6%의 점수를 달성하여 이전 최고 성능을 능가했습니다. 이러한 결과는 MathMixup이 LLM의 수학적 추론 능력을 향상시키고 데이터 중심의 교육 과정 학습을 발전시키는 데 효과적이고 광범위하게 적용될 수 있음을 입증합니다.

Original Abstract

In mathematical reasoning tasks, the advancement of Large Language Models (LLMs) relies heavily on high-quality training data with clearly defined and well-graded difficulty levels. However, existing data synthesis methods often suffer from limited diversity and lack precise control over problem difficulty, making them insufficient for supporting efficient training paradigms such as curriculum learning. To address these challenges, we propose MathMixup, a novel data synthesis paradigm that systematically generates high-quality, difficulty-controllable mathematical reasoning problems through hybrid and decomposed strategies. Automated self-checking and manual screening are incorporated to ensure semantic clarity and a well-structured difficulty gradient in the synthesized data. Building on this, we construct the MathMixupQA dataset and design a curriculum learning strategy that leverages these graded problems, supporting flexible integration with other datasets. Experimental results show that MathMixup and its curriculum learning strategy significantly enhance the mathematical reasoning performance of LLMs. Fine-tuned Qwen2.5-7B achieves an average score of 52.6\% across seven mathematical benchmarks, surpassing previous state-of-the-art methods. These results fully validate the effectiveness and broad applicability of MathMixup in improving the mathematical reasoning abilities of LLMs and advancing data-centric curriculum learning.

1 Citations

0 Influential

6 Altmetric

31.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!