2601.09097v2 Jan 14, 2026 cs.AI

사고보다 프로그래밍: 효율적이고 견고한 다중 제약 계획 수립

Programming over Thinking: Efficient and Robust Multi-Constraint Planning

Derrick-Goh-Xin Deik

Citations: 179

h-index: 7

Quanyu Long

Citations: 274

h-index: 9

Zhengyuan Liu

Citations: 15

h-index: 2

Nancy F. Chen

Citations: 394

h-index: 10

Wenya Wang

Citations: 1

h-index: 1

다중 제약 계획은 상충할 수 있는 여러 제약 조건을 만족시키면서 후보 계획을 식별, 평가 및 개선하는 과정을 포함한다. 기존의 대규모 언어 모델(LLM) 접근 방식은 이 영역에서 근본적인 한계에 직면해 있다. 긴 자연어 연쇄에 의존하는 순수 추론 패러다임은 제약 조건이 복합될수록 불일치, 오류 누적, 그리고 과도한 비용 문제에 취약하다. 반면, 코딩이나 솔버 기반 전략과 결합된 LLM은 유연성이 부족하여, 종종 문제별 코드를 처음부터 생성하거나 고정된 솔버에 의존함으로써 다양한 문제에 걸쳐 일반화할 수 있는 논리를 포착하지 못한다. 이러한 문제를 해결하기 위해, 우리는 질의별 추론과 일반적인 코드 실행을 분리하는 프레임워크인 SCOPE(Scalable COde Planning Engine)를 소개한다. 추론과 실행을 분리함으로써, SCOPE는 입력 매개변수의 최소한의 변경만으로도 여러 질의에서 재사용 가능하고, 일관적이며 결정론적인 솔버 함수를 생성한다. SCOPE는 비용과 지연 시간을 줄이면서도 최첨단 성능을 달성한다. 예를 들어, GPT-4o를 사용했을 때 TravelPlanner에서 93.1%의 성공률을 기록하여 최고 베이스라인(CoT) 대비 61.6%의 성능 향상을 보였으며, 추론 비용은 1.4배, 시간은 약 4.67배 단축했다. 코드는 https://github.com/DerrickGXD/SCOPE 에서 확인할 수 있다.

Original Abstract

Multi-constraint planning involves identifying, evaluating, and refining candidate plans while satisfying multiple, potentially conflicting constraints. Existing large language model (LLM) approaches face fundamental limitations in this domain. Pure reasoning paradigms, which rely on long natural language chains, are prone to inconsistency, error accumulation, and prohibitive cost as constraints compound. Conversely, LLMs combined with coding- or solver-based strategies lack flexibility: they often generate problem-specific code from scratch or depend on fixed solvers, failing to capture generalizable logic across diverse problems. To address these challenges, we introduce the Scalable COde Planning Engine (SCOPE), a framework that disentangles query-specific reasoning from generic code execution. By separating reasoning from execution, SCOPE produces solver functions that are consistent, deterministic, and reusable across queries while requiring only minimal changes to input parameters. SCOPE achieves state-of-the-art performance while lowering cost and latency. For example, with GPT-4o, it reaches 93.1% success on TravelPlanner, a 61.6% gain over the best baseline (CoT) while cutting inference cost by 1.4x and time by ~4.67x. Code is available at https://github.com/DerrickGXD/SCOPE.

0 Citations

0 Influential

33.047189562171 Altmetric

165.2 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!