2604.16565v1 Apr 17, 2026 cs.LG

다변수 공간에서의 추론: 확산 언어 모델의 자기 검증을 위한 양방향 일관성

Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models

Xin Gao

Citations: 100

h-index: 6

Jiaoyang Ruan

Citations: 1

h-index: 1

Jian Pu

Citations: 22

h-index: 3

Hengyu Zeng

Citations: 2

h-index: 1

Yinda Chen

Citations: 0

h-index: 0

Liang Du

Citations: 20

h-index: 3

Guanghao Li

Citations: 9

h-index: 2

Jie Fu

Citations: 30

h-index: 2

확산 대규모 언어 모델(dLLM)은 전반적인 계획 수립에 구조적인 이점을 제공하지만, 모델이 유효한 추론 과정을 통해 정확한 답변에 도달하는지 효율적으로 검증하는 것은 중요한 과제입니다. 본 연구에서는 기하학적인 관점, 즉 '다변수 공간에서의 추론'을 제안합니다. 저희는 유효한 생성 경로가 학습된 분포의 고밀도 다변수 공간 상에 안정적인 끌림점으로 존재하며, 반면 유효하지 않은 경로는 다변수 공간에서 벗어나는 현상을 보일 것이라고 가정합니다. 이를 실현하기 위해, 저희는 양방향 다변수 공간 일관성(Bidirectional Manifold Consistency, BMC)이라는 새로운 훈련 불필요한, 비지도 학습 기반 지표를 도입했습니다. BMC는 순방향 마스킹과 역방향 재구성을 통해 생성된 시퀀스의 안정성을 정량화합니다. 실험적으로, 저희는 BMC가 전체 추론 라이프사이클에 걸쳐 다양한 역할을 수행함을 보여줍니다. (1) 진단 단계에서, BMC는 정답에 대한 지상 진실 정보 없이도 솔루션의 유효성을 강력하게 판별합니다. (2) 추론 단계에서, BMC는 거부 재샘플링을 통해 복잡한 추론 작업에 컴퓨팅 자원을 효과적으로 집중시킵니다. (3) 정렬 단계에서, BMC는 밀집된 기하학적 보상으로 작용하여 희소한 결과 감독을 미세한 수준의 안내로 변환하며, 모델이 기존의 기본 모델을 뛰어넘어 스스로 발전할 수 있도록 지원합니다. 저희의 결과는 고유한 기하학적 안정성이 dLLM의 정확성을 나타내는 강력한 지표임을 입증합니다.

Original Abstract

While Diffusion Large Language Models (dLLMs) offer structural advantages for global planning, efficiently verifying that they arrive at correct answers via valid reasoning traces remains a critical challenge. In this work, we propose a geometric perspective: Reasoning on the Manifold. We hypothesize that valid generation trajectories reside as stable attractors on the high-density manifold of the learned distribution, whereas invalid paths exhibit off-manifold drift. To operationalize this, we introduce Bidirectional Manifold Consistency (BMC), a training-free, unsupervised metric that quantifies the stability of the generated sequence through a forward-masking and backward-reconstruction cycle. Empirically, we demonstrate BMC's versatility across the full reasoning lifecycle: (1) in Diagnosis, it serves as a robust discriminator of solution validity without ground truth answer; (2) in Inference, it enables rejection resampling to effectively concentrate computational resources on complex reasoning tasks; and (3) in Alignment, it functions as a dense geometric reward that transforms sparse outcome supervision into fine-grained guidance, empowering models to self-evolve beyond standard baselines. Our results establish intrinsic geometric stability as a robust indicator of correctness for dLLMs.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!