2604.14768v1 Apr 16, 2026 cs.AI

CoTEvol: 데이터 합성 기반의 수학적 추론을 위한 자체 진화형 체인 오브 씽츠

CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning

Zenglin Xu

Citations: 43

h-index: 2

Zhuo Zhang

Citations: 265

h-index: 7

Yafu Li

Westlake University

Citations: 2,535

h-index: 14

Lizhen Qu

Citations: 15

h-index: 2

Yu Cheng

Citations: 130

h-index: 6

Zhuo Wang

Citations: 12

h-index: 2

대규모 언어 모델(LLM)은 중간 단계를 명확하게 설명하는 고품질 체인 오브 씽츠(CoT) 데이터로 학습될 때 뛰어난 수학적 추론 능력을 보여주지만, CoT 데이터의 구축 비용이 높은 것은 추가적인 발전을 저해하는 요인입니다. 더 강력한 LLM으로부터의 지식 전달 및 테스트 시간에 대한 검색 기반의 자체 합성 등의 기존 방법들이 이러한 문제를 완화하려 하지만, 종종 효과 감소 또는 높은 계산 비용 문제를 겪습니다. 본 연구에서는 CoT 생성을 추론 경로에 대한 개체군 기반 검색으로 간주하는 유전 진화 프레임워크인 CoTEvol을 제안합니다. 후보 경로는 경로 수준에서의 글로벌 교차 및 단계 수준에서의 불확실성에 따른 로컬 변이를 통해 반복적으로 진화되어, 전체적인 결합과 정밀한 개선을 가능하게 합니다. 경량화되고 작업에 특화된 적합성 함수를 설계하여 진화 과정을 정확하고 다양한 추론 방향으로 유도합니다. 실험 결과, CoTEvol은 올바른 CoT 데이터 합성 성공률을 30% 이상 향상시키고, 구조적 다양성을 향상시키며, 효율성 또한 크게 개선했습니다. CoTEvol을 통해 생성된 진화형 CoT 데이터로 학습된 LLM은 8가지 수학적 벤치마크에서 평균 6.6%의 성능 향상을 보여주며, 기존의 지식 전달 및 자체 합성 방식보다 우수한 성능을 나타냈습니다. 이러한 결과는 진화형 CoT 데이터 생성이 수학적 추론 작업에 대한 확장 가능하고 효과적인 방법임을 시사합니다.

Original Abstract

Large Language Models (LLMs) exhibit strong mathematical reasoning when trained on high-quality Chain-of-Thought (CoT) that articulates intermediate steps, yet costly CoT curation hinders further progress. While existing remedies such as distillation from stronger LLMs and self-synthesis based on test-time search alleviate this issue, they often suffer from diminishing returns or high computing overhead.In this work, we propose CoTEvol, a genetic evolutionary framework that casts CoT generation as a population-based search over reasoning trajectories.Candidate trajectories are iteratively evolved through reflective global crossover at the trajectory level and local mutation guided by uncertainty at the step level, enabling holistic recombination and fine-grained refinement. Lightweight, task-aware fitness functions are designed to guide the evolutionary process toward accurate and diverse reasoning. Empirically, CoTEvol improves correct-CoT synthesis success by over 30% and enhances structural diversity, with markedly improved efficiency. LLMs trained on these evolutionary CoT data achieve an average gain of 6.6% across eight math benchmarks, outperforming previous distillation and self-synthesis approaches. These results underscore the promise of evolutionary CoT synthesis as a scalable and effective method for mathematical reasoning tasks.

0 Citations

0 Influential

7 Altmetric

35.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!