2604.27536v1 Apr 30, 2026 cs.AI

검증 가능한 관찰을 통한 대규모 언어 모델 서비스의 신뢰도 기반 추론 제어

Belief-Guided Inference Control for Large Language Model Services via Verifiable Observations

Citations: 0

h-index: 0

Citations: 344

h-index: 10

블랙박스 대규모 언어 모델(LLM) 서비스에서, 응답의 신뢰성은 의사 결정 시점에 부분적으로만 관찰될 수 있으며, 더 강력한 추론 경로는 상당한 계산 비용을 발생시키므로, 예산 제약 하의 순차적 의사 결정 문제가 발생합니다. 각 요청에 대해 시스템은 기본적으로 저렴한 비용으로 생성된 응답이 충분히 신뢰할 만한지 여부를 판단해야 하며, 그렇지 않은 경우 응답 품질을 개선하기 위해 추가적인 계산 자원을 할당해야 합니다. 본 논문에서는 블랙박스 LLM 환경에서 적응적인 추론 제어를 위한 프레임워크인 extbf{V}erifiable extbf{O}bservations for Risk-aware extbf{I}nference extbf{C}ontrol ( extsc{Veroic})을 제안합니다. extsc{Veroic}은 부분적인 관찰 가능성과 순차적인 예산 제약을 반영하기 위해 요청 시의 제어를 extit{부분 관찰 마르코프 의사 결정 과정(partially observable Markov decision process)}으로 모델링합니다. 또한, 입력-출력 쌍으로부터 경량의 검증 가능한 관찰 채널을 구축하여 다양한 품질 신호를 통합하고, 이를 통해 잠재적인 응답 신뢰도에 대한 신뢰 상태를 생성합니다. 이 신뢰 상태는 예산 제약 조건을 고려하는 정책에 의해 사용되어, 기본 출력 결과를 반환할지, 아니면 더 높은 비용의 추론 경로를 활성화할지 결정합니다. 다양한 작업에 대한 실험 결과, extsc{Veroic}은 경쟁적인 기준 모델에 비해 향상된 품질-비용 균형, 더 강력한 위험 추정 및 보정, 그리고 더 견고한 장기 추론 제어 성능을 달성하는 것으로 나타났습니다.

Original Abstract

In black-box large language model (LLM) services, response reliability is often only partially observable at decision time, while stronger inference pathways incur substantial computational cost, inducing a budgeted sequential decision problem: for each request, the system should decide whether the default low-cost response is sufficiently reliable or whether additional computation should be allocated to improve response quality. In this paper, we propose \textbf{Ver}ifiable \textbf{O}bservations for Risk-aware \textbf{I}nference \textbf{C}ontrol (\textsc{Veroic}), a framework for adaptive inference control in black-box LLM settings, which formulates request-time control as a \textit{partially observable Markov decision process} to capture partial observability and sequential budget coupling. It constructs a lightweight verifiable observation channel from the input-output pair by aggregating heterogeneous quality signals into a belief state over latent response reliability, which is then used by a budget-aware policy to decide whether to return the default output or trigger a higher-cost inference pathway. Experiments on diverse tasks show that \textsc{Veroic} achieves improved quality-cost trade-offs, stronger risk estimation and calibration, and more robust long-horizon inference control than competitive baselines.

0 Citations

0 Influential

5 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!