2604.01413v2 Apr 01, 2026 cs.CL

다중 턴 LLM 추론을 위한 적응적 중단 기법

Adaptive Stopping for Multi-Turn LLM Reasoning

Bo Yu

Citations: 47

h-index: 3

Xiaofang Zhou

Citations: 43

h-index: 2

Chenxi Liu

Citations: 24

h-index: 2

Huy Nguyen

Citations: 5

h-index: 2

Lu Cheng

Citations: 7

h-index: 2

대규모 언어 모델(LLM)은 어려운 질문에 답변하기 위해 적응적 정보 검색 증강 생성(RAG) 및 ReAct 스타일 에이전트와 같은 다중 턴 추론 및 상호 작용에 점점 더 의존하고 있습니다. 이러한 방법은 정보를 반복적으로 검색하고, 추론하고, 또는 행동함으로써 정확도를 향상시키지만, 중요한 과제를 야기합니다. **모델은 언제 중단해야 할까요?** 기존 접근 방식은 휴리스틱한 중단 규칙 또는 고정된 턴 예산에 의존하며, 최종 예측이 여전히 정확한 답변을 포함한다는 형식적인 보장을 제공하지 않습니다. 이러한 제한은 금융 및 의료와 같은 고위험 영역에서 특히 문제가 됩니다. 불필요한 턴은 비용과 지연 시간을 증가시키는 반면, 너무 일찍 중단하면 부정확한 결정으로 이어질 수 있습니다. 컨포멀 예측(CP)은 형식적인 커버리지 보장을 제공하지만, 기존의 LLM-CP 방법은 단일 모델 출력에만 적용되며, 적응적 중단을 사용하는 다중 턴 파이프라인을 처리할 수 없습니다. 이러한 격차를 해결하기 위해, 우리는 다중 턴 추론을 위한 최초의 CP 프레임워크인 Multi-Turn Language Models with Conformal Prediction (MiCP)을 제안합니다. MiCP는 각 턴에 서로 다른 오류 예산을 할당하여 모델이 전체 커버리지 보장을 유지하면서도 조기에 중단할 수 있도록 합니다. 우리는 MiCP를 적응적 RAG 및 ReAct에 적용하여, 단일 홉 및 다중 홉 질문 응답 벤치마크에서 목표 커버리지를 달성하면서 동시에 턴 수, 추론 비용 및 예측 집합 크기를 줄이는 것을 보여줍니다. 또한, 커버리지 유효성과 응답 효율성을 동시에 평가하는 새로운 지표를 소개합니다.

Original Abstract

Large Language Models (LLMs) increasingly rely on multi-turn reasoning and interaction, such as adaptive retrieval-augmented generation (RAG) and ReAct-style agents, to answer difficult questions. These methods improve accuracy by iteratively retrieving information, reasoning, or acting, but introduce a key challenge: \textbf{When should the model stop?} Existing approaches rely on heuristic stopping rules or fixed turn budgets and provide no formal guarantees that the final prediction still contains the correct answer. This limitation is particularly problematic in high-stakes domains such as finance and healthcare, where unnecessary turns increase cost and latency, while stopping too early risks incorrect decisions. Conformal prediction (CP) provides formal coverage guarantees, but existing LLM-CP methods only apply to a single model output and cannot handle multi-turn pipelines with adaptive stopping. To address this gap, we propose Multi-Turn Language Models with Conformal Prediction (MiCP), the first CP framework for multi-turn reasoning. MiCP allocates different error budgets across turns, enabling the model to stop early while maintaining an overall coverage guarantee. We demonstrate MiCP on adaptive RAG and ReAct, where it achieves the target coverage on both single-hop and multi-hop question answering benchmarks while reducing the number of turns, inference cost, and prediction set size. We further introduce a new metric that jointly evaluates coverage validity and answering efficiency.

2 Citations

0 Influential

1.5 Altmetric

9.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!