2604.02910v1 Apr 03, 2026 cs.AI

계획 문제에 대한 대규모 언어 모델의 최적성 분석

Analysis of Optimality of Large Language Models on Planning Problems

Bernd Bohnet

Citations: 6,768

h-index: 8

Noah Fiedel

Citations: 24,079

h-index: 20

Kathleen Kenealy

Citations: 6,306

h-index: 9

William A. Cunningham

Citations: 147

h-index: 5

Michael C. Mozer

Citations: 7

h-index: 1

Kevin Swersky

Citations: 681

h-index: 3

Aaron Parisi

Citations: 3,057

h-index: 5

최근 대규모 언어 모델(LLM) 시대에, 기존의 인공지능 계획 문제들이 재조명되고 있으며, 최근 벤치마크는 계획 효율보다는 성공률에 중점을 두고 있습니다. 본 연구에서는 최첨단 모델들이 단순하고 휴리스틱적이며, 잠재적으로 비효율적인 전략에 의존하는 대신, 얼마나 최적으로 추론하는지를 분석합니다. 우리는 레이블이 붙은 블록으로 이루어진 타워를 초기 상태에서 목표 상태로 이동시키는 Blocksworld 도메인을 중심으로 연구를 진행합니다. 또한, 진정한 위상 추론과 의미 기반 정보를 분리하기 위해, 형식적으로 동일한 Path-Star ($P^*$) 그래프 문제를 연구합니다. 우리는 문제의 깊이(블록 타워의 높이), 폭(타워의 개수), 그리고 복합성(목표 블록의 개수)을 체계적으로 조작합니다. 추론 능력이 강화된 LLM은 복잡하고 다중 목표 구성에서 기존의 만족형 계획기(예: LAMA)보다 훨씬 뛰어난 성능을 보입니다. 고전적인 탐색 알고리즘은 탐색 공간이 확장됨에 따라 한계에 부딪히지만, LLM은 도메인별 의미 단서가 제거된 경우에도 이론적인 최적성 한계를 거의 완벽하게 정확하게 추적합니다. 이러한 놀라운 결과를 설명하기 위해, 우리는 두 가지 가설을 고려하고, 그에 대한 증거를 찾았습니다. 첫째는 추론 토큰을 통해 실행되는 능동적인 알고리즘 시뮬레이션이며, 둘째는 $P^*$ 위상을 탐색 가능한 글로벌 기하학으로 표현할 수 있는 기하학적 메모리로서, 이는 지수적인 조합 복잡성을 효과적으로 우회합니다.

Original Abstract

Classic AI planning problems have been revisited in the Large Language Model (LLM) era, with a focus of recent benchmarks on success rates rather than plan efficiency. We examine the degree to which frontier models reason optimally versus relying on simple, heuristic, and possibly inefficient strategies. We focus on the Blocksworld domain involving towers of labeled blocks which have to be moved from an initial to a goal configuration via a set of primitive actions. We also study a formally equivalent task, the generalized Path-Star ($P^*$) graph, in order to isolate true topological reasoning from semantic priors. We systematically manipulate problem depth (the height of block towers), width (the number of towers), and compositionality (the number of goal blocks). Reasoning-enhanced LLMs significantly outperform traditional satisficing planners (e.g., LAMA) in complex, multi-goal configurations. Although classical search algorithms hit a wall as the search space expands, LLMs track theoretical optimality limits with near-perfect precision, even when domain-specific semantic hints are stripped away. To explain these surprising findings, we consider (and find evidence to support) two hypotheses: an active Algorithmic Simulation executed via reasoning tokens and a Geometric Memory that allows models to represent the $P^*$ topology as a navigable global geometry, effectively bypassing exponential combinatorial complexity.

0 Citations

0 Influential

10 Altmetric

50.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!