2601.21358v1 Jan 29, 2026 cs.AI

계획으로서의 잠재적 사고 연쇄: 추론과 언어화의 분리

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Chunyang Liu

Citations: 198

h-index: 7

Jiecong Wang

Citations: 2

h-index: 1

Hao Peng

Citations: 44

h-index: 3

사고 연쇄(Chain-of-Thought, CoT)는 거대 언어 모델(LLM)이 복잡한 문제를 해결할 수 있게 해주지만, 이산 토큰 공간에 기반할 때 발생하는 계산 비용과 추론 경로 붕괴로 인해 제약을 받는다. 최근의 잠재적 추론 접근법들은 연속적인 은닉 상태 내에서 추론을 수행함으로써 효율성을 최적화하려고 시도한다. 그러나 이러한 방법들은 일반적으로 명시적인 추론 단계에서 잠재 상태로의 불투명한 엔드투엔드 매핑으로 작동하며, 종종 추론 시 미리 정의된 횟수의 잠재 단계를 필요로 한다. 본 연구에서는 추론과 언어화를 근본적으로 분리하여 잠재적 추론을 계획(planning)으로 재구성하는 프레임워크인 PLaT(Planning with Latent Thoughts)를 소개한다. 우리는 추론을 잠재적 계획 상태들의 결정론적 궤적으로 모델링하는 한편, 별도의 디코더가 필요할 때 이러한 생각들을 텍스트로 구체화하도록 한다. 이러한 분리를 통해 모델은 고정된 하이퍼파라미터에 의존하는 대신 추론을 종료할 시점을 동적으로 결정할 수 있다. 수학 벤치마크에 대한 실증적 결과는 뚜렷한 트레이드오프를 보여준다. PLaT는 기준 모델보다 낮은 탐욕적 정확도(greedy accuracy)를 보이지만, 추론 다양성 측면에서는 우수한 확장성을 입증한다. 이는 PLaT가 견고하고 더 넓은 솔루션 공간을 학습함을 나타내며, 추론 시점 탐색을 위한 투명하고 확장 가능한 토대를 제공한다.

Original Abstract

Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems, but remains constrained by the computational cost and reasoning path collapse when grounded in discrete token spaces. Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning within continuous hidden states. However, these methods typically operate as opaque end-to-end mappings from explicit reasoning steps to latent states, and often require a pre-defined number of latent steps during inference. In this work, we introduce PLaT (Planning with Latent Thoughts), a framework that reformulates latent reasoning as planning by fundamentally decouple reasoning from verbalization. We model reasoning as a deterministic trajectory of latent planning states, while a separate Decoder grounds these thoughts into text when necessary. This decoupling allows the model to dynamically determine when to terminate reasoning rather than relying on fixed hyperparameters. Empirical results on mathematical benchmarks reveal a distinct trade-off: while PLaT achieves lower greedy accuracy than baselines, it demonstrates superior scalability in terms of reasoning diversity. This indicates that PLaT learns a robust, broader solution space, offering a transparent and scalable foundation for inference-time search.

2 Citations

0 Influential

3.5 Altmetric

19.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!