2604.23530v1 Apr 26, 2026 cs.CL

MTRouter: 비용 인지형 다중 턴 LLM 라우팅 - 히스토리-모델 통합 임베딩 기반

MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings

Shi Feng

Citations: 105

h-index: 5

Xiaocui Yang

Citations: 814

h-index: 13

Daling Wang

Citations: 2,694

h-index: 25

Hao Li

Citations: 121

h-index: 5

Shuyue Hu

Citations: 209

h-index: 7

Lei Bai

Citations: 64

h-index: 4

Yiqun Zhang

Citations: 84

h-index: 4

Zihan Wang

Citations: 15

h-index: 2

Bo Zhang

Citations: 15

h-index: 1

대규모 언어 모델(LLM)에 대한 다중 턴, 장기적인 작업이 점점 더 보편화되고 있지만, 이러한 작업을 해결하는 데는 일반적으로 많은 수의 순차적인 모델 호출이 필요하며, 이는 상당한 추론 비용을 발생시킵니다. 본 연구에서는 비용을 고려한 다중 턴 LLM 라우팅을 다룹니다. 즉, 고정된 비용 예산 내에서 각 턴에서 사용할 모델을 모델 풀에서 선택하는 것을 목표로 합니다. 우리는 MTRouter를 제안합니다. MTRouter는 상호 작용 기록과 후보 모델을 통합된 히스토리-모델 임베딩으로 인코딩하고, 기록된 트레이징 데이터를 기반으로 각 턴에서의 모델 유용성을 예측하는 결과 예측 모델을 학습합니다. 실험 결과, MTRouter는 성능-비용 균형을 개선합니다. ScienceWorld 데이터셋에서 MTRouter는 GPT-5보다 우수한 성능을 보이면서 전체 비용을 58.7% 절감했습니다. Humanity's Last Exam (HLE) 데이터셋에서는 GPT-5와 경쟁력 있는 정확도를 달성하면서 전체 비용을 43.4% 절감했으며, 이러한 이점은 테스트 데이터셋에서도 나타났습니다. 추가적인 분석 결과, MTRouter의 효과성에 기여하는 몇 가지 메커니즘이 밝혀졌습니다. 기존의 다중 턴 라우터와 비교했을 때, MTRouter는 모델 전환 횟수가 적고, 일시적인 오류에 더 강하며, 모델 간에 특화된 동작을 나타냅니다. 코드: https://github.com/ZhangYiqun018/MTRouter

Original Abstract

Multi-turn, long-horizon tasks are increasingly common for large language models (LLMs), but solving them typically requires many sequential model invocations, accumulating substantial inference costs. Here, we study cost-aware multi-turn LLM routing: selecting which model to invoke at each turn from a model pool, given a fixed cost budget. We propose MTRouter, which encodes the interaction history and candidate models into joint history-model embeddings, and learns an outcome estimator from logged trajectories to predict turn-level model utility. Experiments show that MTRouter improves the performance-cost trade-off: on ScienceWorld, it surpasses GPT-5 while reducing total cost by 58.7%; on Humanity's Last Exam (HLE), it achieves competitive accuracy while reducing total cost by 43.4% relative to GPT-5, and these gains even carry over to held-out tasks. Further analyses reveal several mechanisms underlying its effectiveness: relative to prior multi-turn routers, MTRouter makes fewer model switches, is more tolerant to transient errors, and exhibits emergent specialization across models. Code: https://github.com/ZhangYiqun018/MTRouter

0 Citations

0 Influential

32.5 Altmetric

162.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!