2605.28010v1 May 27, 2026 cs.AI

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback

Jinhao Pan
Jinhao Pan
Citations: 34
h-index: 3
Bowen Wei
Bowen Wei
George Mason University
Citations: 48
h-index: 4
Ziwei Zhu
Ziwei Zhu
Citations: 46
h-index: 4
Nanshu Wang
Nanshu Wang
Citations: 46
h-index: 2
Yuqing Zhou
Yuqing Zhou
Citations: 25
h-index: 3

Self-evolving large language models (LLMs) learn by generating their own training tasks and solutions, reducing reliance on human-curated supervision. However, in many reasoning domains, the model must also validate generated tasks and judge generated answers to obtain training signals. This creates a training-signal challenge: erroneous self-judgments become erroneous gradient updates. Existing approaches either rely on external verifiers, which limits generality, or treat noisy self-generated feedback as supervision. We propose COSE (Confidence-Orchestrated Self-Evolution), which uses the LLM's intrinsic confidence as a lightweight uncertainty signal to modulate learning. COSE introduces confidence-weighted PPO updates and confidence-prioritized replay. Across 19 held-out benchmarks and four Qwen/Llama backbones (0.6B--4B), COSE consistently improves over base models and achieves the best average performance in general reasoning and mathematics, while remaining competitive on code. Code and data are available at https://anonymous.4open.science/r/COSE_-B5C2.

0 Citations
0 Influential
2 Altmetric
10.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!