2601.08808v1 Jan 13, 2026 cs.CL

멀티플렉스 사고: 토큰 단위 분기 및 병합을 통한 추론

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Furu Wei

Citations: 249

h-index: 8

Yao Tang

Citations: 76

h-index: 5

Li Dong

Citations: 645

h-index: 11

Y. Hao

Citations: 4,724

h-index: 18

Qingxiu Dong

Citations: 314

h-index: 8

Jiatao Gu

Citations: 534

h-index: 11

대규모 언어 모델은 종종 체인 오브 쏘트(Chain-of-Thought, CoT) 방식을 사용하여 복잡한 추론 작업을 더 효과적으로 수행하지만, 이는 긴 길이의 낮은 대역폭 토큰 시퀀스를 필요로 합니다. 반면, 인간은 종종 가능한 다음 단계에 대한 분포를 유지하면서 부드럽게 추론합니다. 이러한 점에 착안하여, 우리는 각 사고 단계에서 K개의 후보 토큰을 샘플링하고 이들의 임베딩을 단일의 연속적인 멀티플렉스 토큰으로 결합하는 확률적 소프트 추론 메커니즘인 멀티플렉스 사고(Multiplex Thinking)를 제안합니다. 이는 표준 이산 생성의 어휘 임베딩 사전 지식과 샘플링 동역학을 유지하면서, 멀티플렉스 실행에 대한 추론 가능한 확률 분포를 유도합니다. 결과적으로, 멀티플렉스 경로는 온-폴리시 강화 학습(Reinforcement Learning, RL)을 통해 직접 최적화될 수 있습니다. 중요한 점은 멀티플렉스 사고는 자체적으로 적응적이라는 것입니다. 모델이 확신을 갖는 경우, 멀티플렉스 토큰은 거의 이산적이며 표준 CoT와 유사하게 작동하고, 불확실한 경우, 시퀀스 길이를 늘리지 않고 여러 가능한 다음 단계를 간결하게 표현합니다. 어려운 수학적 추론 벤치마크에서, 멀티플렉스 사고는 Pass@1부터 Pass@1024까지의 성능 측면에서 강력한 이산 CoT 및 RL 기준 모델보다 일관되게 우수한 성능을 보이며, 더 짧은 시퀀스를 생성합니다. 코드와 체크포인트는 https://github.com/GMLR-Penn/Multiplex-Thinking 에서 확인할 수 있습니다.

Original Abstract

Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Thinking, a stochastic soft reasoning mechanism that, at each thinking step, samples K candidate tokens and aggregates their embeddings into a single continuous multiplex token. This preserves the vocabulary embedding prior and the sampling dynamics of standard discrete generation, while inducing a tractable probability distribution over multiplex rollouts. Consequently, multiplex trajectories can be directly optimized with on-policy reinforcement learning (RL). Importantly, Multiplex Thinking is self-adaptive: when the model is confident, the multiplex token is nearly discrete and behaves like standard CoT; when it is uncertain, it compactly represents multiple plausible next steps without increasing sequence length. Across challenging math reasoning benchmarks, Multiplex Thinking consistently outperforms strong discrete CoT and RL baselines from Pass@1 through Pass@1024, while producing shorter sequences. The code and checkpoints are available at https://github.com/GMLR-Penn/Multiplex-Thinking.

9 Citations

1 Influential

53.414009612932 Altmetric

278.1 Score

Original PDF

131

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!