2603.14769v1 Mar 16, 2026 cs.LG

POLCA: LLM을 활용한 확률적 생성 최적화

POLCA: Stochastic Generative Optimization with LLM

Xuan Ren

Citations: 53

h-index: 3

Allen Nie

Citations: 129

h-index: 5

Tengyang Xie

Citations: 57

h-index: 4

Ching-An Cheng

Citations: 185

h-index: 6

LLM 프롬프트부터 다중 턴 에이전트까지, 복잡한 시스템을 최적화하는 것은 전통적으로 많은 노동력을 필요로 하는 수동 반복 작업을 포함합니다. 본 연구에서는 이러한 과제를 확률적 생성 최적화 문제로 공식화하고, 생성 언어 모델이 수치적 보상과 텍스트 피드백에 의해 안내되어 최적의 시스템을 찾아내는 최적화 도구로 작동하도록 합니다. 우리는 Prioritized Optimization with Local Contextual Aggregation (POLCA)이라는 확장 가능한 프레임워크를 소개합니다. POLCA는 노이즈가 많은 피드백, 샘플링 미니배치, 확률적 시스템 동작과 같은 최적화 과정에서의 불확실성을 처리하고, 동시에 해결 공간의 제약 없는 확장을 효과적으로 관리하도록 설계되었습니다. POLCA는 탐색과 활용 사이의 균형을 관리하기 위해 우선순위 큐를 유지하며, 후보 솔루션과 해당 평가 기록을 체계적으로 추적합니다. 효율성을 높이기 위해, POLCA는 파라미터 다양성을 유지하기 위한 $\varepsilon$-Net 메커니즘과, 과거 시도에서 메타 학습을 수행하기 위한 LLM Summarizer를 통합합니다. 본 연구에서는 POLCA가 확률적 환경에서도 거의 최적의 후보 솔루션으로 수렴한다는 것을 이론적으로 증명합니다. 또한, $τ$-bench, HotpotQA (에이전트 최적화), VeriBench (코드 번역) 및 KernelBench (CUDA 커널 생성)를 포함한 다양한 벤치마크에서 POLCA의 성능을 평가했습니다. 실험 결과는 POLCA가 결정론적 및 확률적 문제 모두에서 최첨단 알고리즘보다 뛰어난 견고하고, 샘플 및 시간 효율적인 성능을 제공한다는 것을 보여줍니다. 본 연구의 코드는 다음 GitHub 저장소에서 공개적으로 이용할 수 있습니다: https://github.com/rlx-lab/POLCA.

Original Abstract

Optimizing complex systems, ranging from LLM prompts to multi-turn agents, traditionally requires labor-intensive manual iteration. We formalize this challenge as a stochastic generative optimization problem where a generative language model acts as the optimizer, guided by numerical rewards and text feedback to discover the best system. We introduce Prioritized Optimization with Local Contextual Aggregation (POLCA), a scalable framework designed to handle stochasticity in optimization -- such as noisy feedback, sampling minibatches, and stochastic system behaviors -- while effectively managing the unconstrained expansion of solution space. POLCA maintains a priority queue to manage the exploration-exploitation tradeoff, systematically tracking candidate solutions and their evaluation histories. To enhance efficiency, we integrate an $\varepsilon$-Net mechanism to maintain parameter diversity and an LLM Summarizer to perform meta-learning across historical trials. We theoretically prove that POLCA converges to near-optimal candidate solutions under stochasticity. We evaluate our framework on diverse benchmarks, including $τ$-bench, HotpotQA (agent optimization), VeriBench (code translation) and KernelBench (CUDA kernel generation). Experimental results demonstrate that POLCA achieves robust, sample and time-efficient performance, consistently outperforming state-of-the-art algorithms in both deterministic and stochastic problems. The codebase for this work is publicly available at https://github.com/rlx-lab/POLCA.

1 Citations

0 Influential

34.51292546497 Altmetric

173.6 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!