2604.05859v1 Apr 07, 2026 cs.AI

언제 LLM이 필요한가? 언어 기반 강화학습 알고리즘을 위한 진단 도구

When Do We Need LLMs? A Diagnostic for Language-Driven Bandits

Anton Ipsen

Citations: 0

h-index: 0

Parisa Zehtabi

Citations: 100

h-index: 5

Manuela Veloso

Citations: 69

h-index: 5

Fernando Acero

Citations: 8

h-index: 1

Michael Cashmore

Citations: 4

h-index: 1

Uljad Berdica

Citations: 34

h-index: 4

본 연구는 텍스트 및 숫자 정보를 모두 포함하는 맥락(context)에서 비-에피소드 방식으로 순차적인 의사 결정을 내리는 문제, 즉 맥락적 다액 armed bandit (Contextual Multi-Armed Bandit, CMAB) 문제를 다룬다. (예: 추천 시스템, 동적 포트폴리오 조정, 상품 선택; 이는 금융 분야에서 흔히 발생하는 문제들이다). 대규모 언어 모델(Large Language Model, LLM)이 이러한 문제에 점점 더 많이 적용되고 있지만, LLM을 사용하여 매 결정 단계마다 추론을 수행하는 것은 계산 비용이 매우 높고, 불확실성 추정 또한 어렵다. 이러한 문제를 해결하기 위해, LLMP-UCB라는 강화학습 알고리즘을 제안한다. 이 알고리즘은 반복적인 추론을 통해 LLM으로부터 불확실성 추정치를 얻는다. 그러나 실험 결과, 텍스트 임베딩(dense 또는 Matryoshka)을 기반으로 작동하는 경량화된 숫자 기반 강화학습 알고리즘이 LLM 기반 솔루션의 정확도에 못지 않거나 더 높은 정확도를 보이면서도 훨씬 저렴한 비용으로 작동한다는 것을 보여준다. 또한, 임베딩 차원(dimensionality)이 탐색-활용(exploration-exploitation) 균형을 조절하는 중요한 요소이며, 프롬프트 복잡성 없이 비용-성능 균형을 맞출 수 있음을 보여준다. 마지막으로, 실무자들을 위해, 각 선택지(arm)의 임베딩을 기반으로 LLM 기반 추론을 사용할지, 아니면 경량화된 숫자 기반 강화학습 알고리즘을 사용할지 결정하는 데 도움이 되는 기하학적 진단 도구를 제안한다. 본 연구의 결과는 비용 효율적이고 불확실성을 고려한 의사 결정 시스템을 구축하기 위한 체계적인 프레임워크를 제공하며, 이는 금융 서비스 분야를 포함한 다양한 AI 활용 사례에 폭넓게 적용될 수 있다.

Original Abstract

We study Contextual Multi-Armed Bandits (CMABs) for non-episodic sequential decision making problems where the context includes both textual and numerical information (e.g., recommendation systems, dynamic portfolio adjustments, offer selection; all frequent problems in finance). While Large Language Models (LLMs) are increasingly applied to these settings, utilizing LLMs for reasoning at every decision step is computationally expensive and uncertainty estimates are difficult to obtain. To address this, we introduce LLMP-UCB, a bandit algorithm that derives uncertainty estimates from LLMs via repeated inference. However, our experiments demonstrate that lightweight numerical bandits operating on text embeddings (dense or Matryoshka) match or exceed the accuracy of LLM-based solutions at a fraction of their cost. We further show that embedding dimensionality is a practical lever on the exploration-exploitation balance, enabling cost--performance tradeoffs without prompt complexity. Finally, to guide practitioners, we propose a geometric diagnostic based on the arms' embedding to decide when to use LLM-driven reasoning versus a lightweight numerical bandit. Our results provide a principled deployment framework for cost-effective, uncertainty-aware decision systems with broad applicability across AI use cases in financial services.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!