2602.00929v1 Jan 31, 2026 cs.AI

프로그램 합성 에이전트에서의 계층적 계획을 위한 추상화 학습

Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents

Zergham Ahmed

Citations: 13

h-index: 1

Kazuki Irie

Citations: 53

h-index: 3

Joshua B. Tenenbaum

Citations: 3,015

h-index: 9

Christopher J. Bates

Citations: 13

h-index: 1

Samuel Gershman

Citations: 195

h-index: 3

인간은 추상화를 학습하고 이를 사용하여 효율적으로 계획함으로써 다양한 태스크에 걸쳐 빠르게 일반화할 수 있는데, 이는 최첨단 대규모 언어 모델(LLM) 에이전트와 심층 강화 학습(RL) 시스템에게는 여전히 어려운 과제입니다. 사람들이 어떻게 추상화와 세계 지식에 대한 직관적 이론을 형성하는지에 대한 인지 과학에서 영감을 받은 TheoryCoder와 같은 이론 기반 RL(TBRL) 시스템은, 추상화의 효과적인 사용을 통해 강력한 일반화 능력을 보여줍니다. 그러나 이러한 시스템들은 인간이 제공한 추상화에 크게 의존하며 추상화 학습 문제를 회피합니다. 본 논문에서는 TheoryCoder-2를 소개합니다. 이는 수동으로 지정된 추상화에 의존하는 대신, 경험으로부터 추상화를 합성하고 이를 계층적 계획 과정에 통합함으로써 LLM의 인컨텍스트 러닝(in-context learning) 능력을 활용해 재사용 가능한 추상화를 능동적으로 학습하는 새로운 TBRL 에이전트입니다. 우리는 BabyAI, Minihack, 그리고 소코반(Sokoban)과 같은 VGDL 게임을 포함한 다양한 환경에서 실험을 수행했습니다. 실험 결과, TheoryCoder-2는 고전적 계획 도메인 구축 및 추론 기반 계획으로 강화된 기준(baseline) LLM 에이전트나 WorldCoder와 같은 기존 프로그램 합성 에이전트보다 샘플 효율성이 훨씬 뛰어나다는 것을 확인했습니다. TheoryCoder-2는 기존 TBRL 시스템과 달리 최소한의 인간 프롬프트만 필요로 하면서도, 기준 모델들이 실패하는 복잡한 태스크를 해결할 수 있습니다.

Original Abstract

Humans learn abstractions and use them to plan efficiently to quickly generalize across tasks -- an ability that remains challenging for state-of-the-art large language model (LLM) agents and deep reinforcement learning (RL) systems. Inspired by the cognitive science of how people form abstractions and intuitive theories of their world knowledge, Theory-Based RL (TBRL) systems, such as TheoryCoder, exhibit strong generalization through effective use of abstractions. However, they heavily rely on human-provided abstractions and sidestep the abstraction-learning problem. We introduce TheoryCoder-2, a new TBRL agent that leverages LLMs' in-context learning ability to actively learn reusable abstractions rather than relying on hand-specified ones, by synthesizing abstractions from experience and integrating them into a hierarchical planning process. We conduct experiments on diverse environments, including BabyAI, Minihack and VGDL games like Sokoban. We find that TheoryCoder-2 is significantly more sample-efficient than baseline LLM agents augmented with classical planning domain construction, reasoning-based planning, and prior program-synthesis agents such as WorldCoder. TheoryCoder-2 is able to solve complex tasks that the baselines fail, while only requiring minimal human prompts, unlike prior TBRL systems.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!