2604.11188v1 Apr 13, 2026 cs.CL

MathAgent: 수학적 추론 데이터 생성화를 위한 제약 그래프의 적대적 진화

MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis

Min Zhang

Citations: 17

h-index: 3

Zixiong Yu

Citations: 58

h-index: 4

Jun Rao

Citations: 24

h-index: 3

Guhan Chen

Citations: 35

h-index: 2

Bohan Li

Citations: 50

h-index: 3

Jiansheng Wei

Citations: 62

h-index: 4

Song Tian

Citations: 8

h-index: 2

Xiaojun Meng

Citations: 86

h-index: 6

인간의 사전 지식 없이 고품질의 수학적 추론 데이터를 생성하는 것은 여전히 중요한 과제입니다. 현재의 접근 방식은 일반적으로 시드 데이터의 변형 또는 간단한 프롬프트 엔지니어링에 의존하며, 종종 모드 붕괴 및 제한적인 논리적 복잡성을 겪습니다. 본 논문에서는 데이터 생성화를 직접적인 텍스트 생성 작업으로 취급하는 대신, 제약 그래프에 대한 비지도 최적화 문제로 정의하고, 이후 의미론적 인스턴스를 수행하는 계층적 생성 프레임워크를 제안합니다. 우리는 '입법자-실행자(Legislator-Executor)' 패러다임을 도입합니다. 입법자는 문제의 제약을 인코딩하는 구조화된 생성 청사진을 적대적으로 진화시키는 반면, 실행자는 이러한 사양을 다양한 자연어 시나리오로 구현합니다. 이러한 골격 설계와 언어적 구현의 분리는 복잡하고 다양한 논리적 구조를 구축하는 데 우선적으로 집중할 수 있도록 하여 고품질 데이터 생성화를 유도합니다. Qwen, Llama, Mistral, Gemma 시리즈의 총 10개 모델에 대한 실험 결과, 1,000개의 생성된 샘플로 미세 조정된 모델은 LIMO, s1K와 같은 널리 사용되는 데이터 세트와 비교하여 8개의 수학적 벤치마크에서 뛰어난 성능을 보였으며, 우수한 일반화 능력을 보여주었습니다.

Original Abstract

Synthesizing high-quality mathematical reasoning data without human priors remains a significant challenge. Current approaches typically rely on seed data mutation or simple prompt engineering, often suffering from mode collapse and limited logical complexity. This paper proposes a hierarchical synthesis framework that formulates data synthesis as an unsupervised optimization problem over a constraint graph followed by semantic instantiation, rather than treating it as a direct text generation task. We introduce a Legislator-Executor paradigm: The Legislator adversarially evolves structured generation blueprints encoding the constraints of the problem, while the Executor instantiates these specifications into diverse natural language scenarios. This decoupling of skeleton design from linguistic realization enables a prioritized focus on constructing complex and diverse logical structures, thereby guiding high-quality data synthesis. Experiments conducted on a total of 10 models across the Qwen, Llama, Mistral, and Gemma series demonstrate that our method achieves notable results: models fine-tuned on 1K synthesized samples outperform widely-used datasets of comparable scale (LIMO, s1K) across eight mathematical benchmarks, exhibiting superior out-of-distribution generalization.

1 Citations

0 Influential

3 Altmetric

16.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!