2601.21358v2 Jan 29, 2026 cs.AI

잠재적 연쇄적 사고를 계획으로: 추론과 언어화를 분리

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Chunyang Liu

Citations: 218

h-index: 8

Jiecong Wang

Citations: 3

h-index: 1

Hao Peng

Citations: 56

h-index: 4

연쇄적 사고(Chain-of-Thought, CoT)는 대규모 언어 모델(LLM)이 복잡한 문제를 해결하는 데 도움을 주지만, 이산적인 토큰 공간에 기반할 경우 계산 비용이 높고 추론 경로가 붕괴되는 문제가 있습니다. 최근의 잠재적 추론 방법들은 효율성을 높이기 위해 연속적인 은닉 상태 내에서 추론을 수행하려고 시도합니다. 그러나 이러한 방법들은 일반적으로 명시적인 추론 단계를 잠재적 상태로 변환하는 불투명한 전체 매핑 방식으로 작동하며, 추론 과정에서 종종 미리 정의된 수의 잠재적 단계를 필요로 합니다. 본 연구에서는 잠재적 사고를 계획으로 재구성하는 프레임워크인 PLaT(Planning with Latent Thoughts)를 소개합니다. PLaT는 추론을 언어화로부터 근본적으로 분리하여 추론을 잠재적 계획 상태의 결정론적 경로로 모델링하고, 필요에 따라 별도의 디코더가 이러한 사고를 텍스트로 변환합니다. 이러한 분리는 모델이 고정된 하이퍼파라미터에 의존하지 않고 추론을 언제 종료할지 동적으로 결정할 수 있도록 합니다. 수학적 벤치마크에 대한 실험 결과는 뚜렷한 상충 관계를 보여줍니다. PLaT는 기준 모델보다 낮은 탐욕적 정확도를 달성하지만, 추론의 다양성 측면에서 더 뛰어난 확장성을 보여줍니다. 이는 PLaT가 강력하고 광범위한 솔루션 공간을 학습하며, 추론 시간 검색을 위한 투명하고 확장 가능한 기반을 제공한다는 것을 의미합니다. 저희의 코드는 https://github.com/yunsaijc/PLaT 에서 확인할 수 있습니다.

Original Abstract

Chain-of-Thought (CoT) empowers Large Language Models (LLMs) to tackle complex problems, but remains constrained by the computational cost and reasoning path collapse when grounded in discrete token spaces. Recent latent reasoning approaches attempt to optimize efficiency by performing reasoning within continuous hidden states. However, these methods typically operate as opaque end-to-end mappings from explicit reasoning steps to latent states, and often require a pre-defined number of latent steps during inference. In this work, we introduce PLaT (Planning with Latent Thoughts), a framework that reformulates latent reasoning as planning by fundamentally decouple reasoning from verbalization. We model reasoning as a deterministic trajectory of latent planning states, while a separate Decoder grounds these thoughts into text when necessary. This decoupling allows the model to dynamically determine when to terminate reasoning rather than relying on fixed hyperparameters. Empirical results on mathematical benchmarks reveal a distinct trade-off: while PLaT achieves lower greedy accuracy than baselines, it demonstrates superior scalability in terms of reasoning diversity. This indicates that PLaT learns a robust, broader solution space, offering a transparent and scalable foundation for inference-time search. Our code can be found in https://github.com/yunsaijc/PLaT.

2 Citations

0 Influential

34.397207708399 Altmetric

174.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!