2603.28135v1 Mar 30, 2026 cs.AI

CoT2-Meta: 예산 기반 메타인지 제어를 통한 추론 시간 성능 향상

CoT2-Meta: Budgeted Metacognitive Control for Test-Time Reasoning

Bofei Gao

Citations: 1,214

h-index: 6

Zikai Xiao

Citations: 93

h-index: 5

Xinle Yu

Citations: 18

h-index: 2

Jiayu Qian

Citations: 8

h-index: 2

Siyuan Ma

Citations: 4

h-index: 1

Hailong Wang

Citations: 12

h-index: 2

Ruixiang Qian

Citations: 88

h-index: 5

Luqi Gong

Citations: 42

h-index: 3

Yang Liu

Citations: 12

h-index: 1

최근의 추론 시간 성능 향상 방법들은 더 많은 후보 연쇄를 생성하거나 더 큰 추론 트리를 탐색함으로써 성능을 개선하지만, 일반적으로 언제 확장을 수행할지, 무엇을 제거할지, 어떻게 복구할지, 그리고 언제 답변을 거부할지에 대한 명시적인 제어가 부족합니다. 본 연구에서는 CoT2-Meta라는 새로운 메타인지 추론 프레임워크를 제안합니다. CoT2-Meta는 학습 과정 없이 객체 수준의 연쇄 추론 생성과 부분적인 추론 경로에 대한 메타 수준의 제어를 결합합니다. 이 프레임워크는 전략 기반의 추론 생성, 트리 구조 기반의 탐색, 단계별 추론 평가를 위한 온라인 프로세스 오라클, 그리고 확장, 제거, 복구, 중단, 대체 결정 등을 통해 계산 자원을 할당하는 메타 컨트롤러의 네 가지 구성 요소로 이루어져 있습니다. 동일한 추론 예산 하에서, CoT2-Meta는 ReST-MCTS를 포함한 강력한 단일 경로, 샘플링 기반, 그리고 탐색 기반의 기존 방법들보다 우수한 성능을 보입니다. 기본 모델을 사용했을 때, MATH 데이터셋에서 92.8의 EM 점수, GPQA 데이터셋에서 90.4의 정확도, GSM8K 데이터셋에서 98.65의 EM 점수, BBEH 데이터셋에서 75.8의 정확도, MMMU-Pro 데이터셋에서 85.6의 정확도, 그리고 HLE 데이터셋에서 48.8의 정확도를 달성했으며, 이는 가장 강력한 CoT2-Meta가 아닌 기준 모델보다 각각 +3.6, +5.2, +1.15, +2.0, +4.3, 그리고 +4.3점이 더 높은 수치입니다. 이러한 핵심 결과 외에도, 이 프레임워크는 지식 기반 질의응답, 다단계 추론, 코딩, 그리고 일반화 성능 평가를 포함하는 광범위한 15개의 벤치마크 데이터셋에서 효과적인 성능을 유지합니다. 추가적인 분석 결과, CoT2-Meta는 더 나은 계산 자원 활용, 향상된 교정, 더욱 정확한 예측, 효과적인 오류 복구, 그리고 다양한 기본 모델에서 일관된 성능 향상을 보여줍니다. 이러한 결과는 명시적인 메타인지 제어가 신뢰성 있고 계산 효율적인 추론 시스템을 설계하는 데 유용한 원칙임을 시사합니다.

Original Abstract

Recent test-time reasoning methods improve performance by generating more candidate chains or searching over larger reasoning trees, but they typically lack explicit control over when to expand, what to prune, how to repair, and when to abstain. We introduce CoT2-Meta, a training-free metacognitive reasoning framework that combines object-level chain-of-thought generation with meta-level control over partial reasoning trajectories. The framework integrates four components: strategy-conditioned thought generation, tree-structured search, an online process oracle for step-level reasoning evaluation, and a meta-controller that allocates computation through expansion, pruning, repair, stopping, and fallback decisions. Under matched inference budgets, CoT2-Meta consistently outperforms strong single-path, sampling-based, and search-based baselines, including ReST-MCTS. On the default backbone, it achieves 92.8 EM on MATH, 90.4 accuracy on GPQA, 98.65 EM on GSM8K, 75.8 accuracy on BBEH, 85.6 accuracy on MMMU-Pro, and 48.8 accuracy on HLE, with gains over the strongest non-CoT2-Meta baseline of +3.6, +5.2, +1.15, +2.0, +4.3, and +4.3 points, respectively. Beyond these core results, the framework remains effective across a broader 15-benchmark suite spanning knowledge and QA, multi-hop reasoning, coding, and out-of-distribution evaluation. Additional analyses show better compute scaling, improved calibration, stronger selective prediction, targeted repair behavior, and consistent gains across backbone families. These results suggest that explicit metacognitive control is a practical design principle for reliable and compute-efficient test-time reasoning systems.

1 Citations

0 Influential

3 Altmetric

16.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!