2602.01935v1 Feb 02, 2026 cs.LG

COLT: 공유된 몬테카를로 트리 탐색(MCTS) 기반 다중 LLM 협업을 통한 경량 모델 컴파일

COLT: Lightweight Multi-LLM Collaboration through Shared MCTS Reasoning for Model Compilation

Annabelle Sujun Tang

Citations: 8

h-index: 2

Christopher Priebe

Citations: 14

h-index: 3

Lianhui Qin

Citations: 8

h-index: 2

Hadi Esmaeilzadeh

Citations: 17

h-index: 3

AI 시스템에서 모델 실행 비용이 상당 부분을 차지하므로, 확장 가능한 배포를 위해서는 컴파일러 최적화가 필수적입니다. 최근 연구에서는 대규모 언어 모델(LLM)이 프로그램 구조와 최적화 이력을 기반으로 컴파일러 검색을 안내할 수 있다는 점이 밝혀졌습니다. 그러나 검색 과정 전체에 단일 대규모 모델을 사용하는 것은 비용이 많이 들고, 작은 모델을 단독으로 사용하는 경우 신뢰성이 떨어지는 경우가 많습니다. 따라서 본 연구는 주로 작은 LLM을 활용한 다중 LLM 협업 추론이 단일 대규모 모델의 성능에 부합하거나 능가할 수 있는지에 대한 질문에 답하고자 합니다. 이에, 본 연구에서는 단일 몬테카를로 트리 탐색(MCTS) 프로세스 내에서 여러 모델 간의 조화로운 추론을 가능하게 하는 경량 다중 LLM 협업 프레임워크인 COLT를 제안합니다. 주요 기여점은 LLM 간의 협업을 위한 단일 공유 MCTS 트리를 활용하여 변환 프리픽스를 재사용하고 모델 간 값 전파를 가능하게 한다는 점입니다. 따라서 본 연구는 복잡한 내부 추론 메커니즘 및 외부 계획기, 다중 동시 LLM, 데이터베이스, 외부 메모리/중간 결과 버전 관리, 제어 시스템 등에 의존하는 기존 방식 대신, 경량 MCTS 최적화 루프 내에서 모델 선택을 자체적으로 수행하여 이러한 문제를 해결합니다. 각 반복 단계에서, 활성 LLM은 다음과 같은 공동 작업을 제안합니다: (컴파일러 변환, 다음에 쿼리할 모델). 또한, 모델의 특징을 고려한 트리 정책을 도입하여 탐색을 작은 모델 쪽으로 유도하면서도 탐색의 다양성을 유지하고, 작은 모델로 인해 발생하는 지속적인 성능 저하가 감지되면 가장 큰 모델로 전환하는 메커니즘을 제공합니다.

Original Abstract

Model serving costs dominate AI systems, making compiler optimization essential for scalable deployment. Recent works show that a large language model (LLM) can guide compiler search by reasoning over program structure and optimization history. However, using a single large model throughout the search is expensive, while smaller models are less reliable when used alone. Thus, this paper seeks to answer whether multi-LLM collaborative reasoning relying primarily on small LLMs can match or exceed the performance of a single large model. As such, we propose a lightweight collaborative multi-LLM framework, dubbed COLT, for compiler optimization that enables coordinated reasoning across multiple models within a single Monte Carlo tree search (MCTS) process. A key contribution is the use of a single shared MCTS tree as the collaboration substrate across LLMs, enabling the reuse of transformation prefixes and cross-model value propagation. Hence, we circumvent both heavy internal reasoning mechanisms and conventional agentic machinery that relies on external planners, multiple concurrent LLMs, databases, external memory/versioning of intermediate results, and controllers by simply endogenizing model selection within the lightweight MCTS optimization loop. Every iteration, the acting LLM proposes a joint action: (compiler transformation, model to be queried next). We also introduce a model-aware tree policy that biases search toward smaller models while preserving exploration, and a course-alteration mechanism that escalates to the largest model when the search exhibits persistent regressions attributable to smaller models.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!