2602.13262v1 Feb 03, 2026 cs.AI

클론을 활용한 일반적인 학습 기반 위임

General learned delegation by clones

Darren Li

Citations: 5

h-index: 1

Meiqi Chen

Citations: 0

h-index: 0

Chenze Shao

Citations: 449

h-index: 14

Fandong Meng

Citations: 6,827

h-index: 40

Jie Zhou

Citations: 614

h-index: 14

최첨단 언어 모델은 추가적인 추론 시간 계산을 통해 성능을 향상시킬 수 있지만, 순차적인 추론 또는 조정되지 않은 병렬 샘플링은 고정된 추론 예산 하에서 계산 효율성이 떨어질 수 있습니다. 본 논문에서는 SELFCEST를 제안합니다. SELFCEST는 에이전트 기반 강화 학습을 통해 기본 모델에 동일한 가중치를 가진 클론을 생성하고, 이를 별도의 병렬 환경에서 실행하는 기능을 부여합니다. 전체 작업에 대한 전역적인 보상을 기반으로, 공유 파라미터를 사용하여 롤아웃을 수행하는 방식으로 학습이 진행됩니다. 이를 통해 생성과 컨텍스트 예산을 분산에 할당하는 학습된 제어기를 얻을 수 있습니다. 어려운 수학적 추론 벤치마크와 긴 컨텍스트를 활용한 멀티홉 질의응답에서, SELFCEST는 동일한 추론 예산 하에서 단일 모델 기반의 기존 방식보다 정확도-비용 파레토 프런티어를 개선하며, 두 분야 모두에서 일반화 능력을 향상시킵니다.

Original Abstract

Frontier language models improve with additional test-time computation, but serial reasoning or uncoordinated parallel sampling can be compute-inefficient under fixed inference budgets. We propose SELFCEST, which equips a base model with the ability to spawn same-weight clones in separate parallel contexts by agentic reinforcement learning. Training is end-to-end under a global task reward with shared-parameter rollouts, yielding a learned controller that allocates both generation and context budget across branches. Across challenging math reasoning benchmarks and long-context multi-hop QA, SELFCEST improves the accuracy-cost Pareto frontier relative to monolithic baselines at matched inference budget, and exhibits out-of-distribution generalization in both domains.

0 Citations

0 Influential

20 Altmetric

100.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!