2601.20467v1 Jan 28, 2026 cs.AI

CtrlCoT: 제어 가능한 추론을 위한 이중 입도 사고 사슬 압축

CtrlCoT: Dual-Granularity Chain-of-Thought Compression for Controllable Reasoning

Zheqi Lv

Citations: 20

h-index: 2

Wenqiao Zhang

Citations: 33

h-index: 3

Zhongle Xie

Citations: 9

h-index: 2

Bengchin Ooi

Citations: 360

h-index: 8

Zhenxuan Fan

Citations: 11

h-index: 2

Jie Cao

Citations: 143

h-index: 4

Yang Dai

Citations: 18

h-index: 2

Peng Lu

Citations: 6

h-index: 2

사고 사슬(CoT) 프롬프팅은 LLM의 추론 능력을 향상시키지만, 장황한 출력으로 인해 높은 지연 시간과 메모리 비용을 초래하여 정확성을 유지한 CoT 압축의 필요성을 제기한다. 기존 방법들은 의미론적 수준에서 CoT를 단축하거나(이는 종종 보수적임) 토큰을 공격적으로 가지치기(pruning)하는데, 후자의 경우 작업에 중요한 단서를 놓쳐 정확도를 저하시킬 수 있다. 게다가 순차적 의존성, 작업과 무관한 가지치기 방식, 분포 불일치 문제로 인해 이 두 가지를 결합하는 것은 결코 사소한 문제가 아니다. 본 논문에서는 세 가지 구성 요소를 통해 의미론적 추상화와 토큰 수준의 가지치기를 조화시키는 이중 입도 CoT 압축 프레임워크인 CtrlCoT를 제안한다. 첫째, '계층적 추론 추상화'는 다중 의미 입도로 CoT를 생성한다. 둘째, '논리 보존 증류'는 다양한 압축 비율에서도 필수적인 추론 단서(예: 숫자 및 연산자)를 유지하도록 논리 인식 가지치기 모델을 훈련한다. 셋째, '분포 정렬 생성'은 압축된 결과물을 유창한 추론 스타일과 정렬하여 파편화를 방지한다. Qwen2.5-7B-Instruct 모델을 사용한 MATH-500 평가에서 CtrlCoT는 가장 강력한 베이스라인 대비 30.7% 더 적은 토큰을 사용하면서도 7.6% 포인트 더 높은 성능을 달성하여, 더욱 효율적이고 신뢰할 수 있는 추론 능력을 입증했다. 코드는 https://github.com/fanzhenxuan/Ctrl-CoT 에서 공개될 예정이다.

Original Abstract

Chain-of-thought (CoT) prompting improves LLM reasoning but incurs high latency and memory cost due to verbose traces, motivating CoT compression with preserved correctness. Existing methods either shorten CoTs at the semantic level, which is often conservative, or prune tokens aggressively, which can miss task-critical cues and degrade accuracy. Moreover, combining the two is non-trivial due to sequential dependency, task-agnostic pruning, and distribution mismatch. We propose \textbf{CtrlCoT}, a dual-granularity CoT compression framework that harmonizes semantic abstraction and token-level pruning through three components: Hierarchical Reasoning Abstraction produces CoTs at multiple semantic granularities; Logic-Preserving Distillation trains a logic-aware pruner to retain indispensable reasoning cues (e.g., numbers and operators) across pruning ratios; and Distribution-Alignment Generation aligns compressed traces with fluent inference-time reasoning styles to avoid fragmentation. On MATH-500 with Qwen2.5-7B-Instruct, CtrlCoT uses 30.7\% fewer tokens while achieving 7.6 percentage points higher than the strongest baseline, demonstrating more efficient and reliable reasoning. Our code will be publicly available at https://github.com/fanzhenxuan/Ctrl-CoT.

2 Citations

0 Influential

27.4657359028 Altmetric

139.3 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!