2601.11905v1 Jan 17, 2026 cs.AI

LIBRA: 개인 맞춤형 치료 계획을 위한 언어 모델 정보 기반 밴딧 리코스 알고리즘

LIBRA: Language Model Informed Bandit Recourse Algorithm for Personalized Treatment Planning

Junyu Cao

Citations: 11

h-index: 2

Ruijiang Gao

Citations: 48

h-index: 4

Esmaeil Keyvanshokooh

Citations: 940

h-index: 10

Jianhao Ma

Citations: 11

h-index: 2

본 논문은 개인 맞춤형 의학과 같이 중대한 결과가 따르는(high-stakes) 환경에서의 순차적 의사결정을 지원하기 위해, 알고리즘적 리코스(algorithmic recourse), 컨텍스추얼 밴딧(contextual bandits), 그리고 거대언어모델(LLM)을 유기적으로 통합하는 단일 프레임워크를 소개한다. 먼저 우리는 의사결정자가 치료 행동뿐만 아니라 변경 가능한 환자 특성에 대해 실행 가능하고 최소한의 수정 사항을 함께 선택해야 하는 리코스 밴딧(recourse bandit) 문제를 정의한다. 이 문제를 해결하기 위해 일반화 선형 리코스 밴딧(Generalized Linear Recourse Bandit, GLRB) 알고리즘을 개발하였다. 이를 바탕으로 우리는 LLM의 도메인 지식과 밴딧 학습의 통계적 엄밀함을 전략적으로 결합한 언어 모델 정보 기반 밴딧 리코스 알고리즘인 LIBRA를 제안한다. LIBRA는 세 가지 핵심 보장을 제공한다. (i) 웜 스타트(warm-start) 보장: LLM의 추천이 최적에 가까울 때 초기 리그렛(regret)을 유의미하게 감소시킨다. (ii) LLM-노력(effort) 보장: 알고리즘이 LLM을 $O(\log^2 T)$회($T$는 전체 시간)만 참조함을 증명하여 장기적인 자율성을 보장한다. (iii) 강건성(robustness) 보장: LLM을 신뢰할 수 없는 경우에도 순수 밴딧 알고리즘보다 성능이 떨어지지 않음을 보여준다. 더 나아가 우리는 리코스 밴딧 문제의 근본적인 난이도를 규명하는 일치 하한(matching lower bounds)을 확립하고 제안하는 알고리즘의 근사 최적성(near-optimality)을 입증한다. 합성 환경 및 실제 고혈압 관리 사례 연구를 통한 실험 결과, GLRB와 LIBRA가 기존의 컨텍스추얼 밴딧 및 LLM 전용 벤치마크 대비 리그렛, 치료 품질, 샘플 효율성을 향상시키는 것으로 확인되었다. 우리의 결과는 개인 맞춤형 고위험 의사결정 분야에서 신뢰할 수 있는 LLM-밴딧 협업을 위한 리코스 인식(recourse-aware) 및 LLM 보조 밴딧 알고리즘의 유망함을 강조한다.

Original Abstract

We introduce a unified framework that seamlessly integrates algorithmic recourse, contextual bandits, and large language models (LLMs) to support sequential decision-making in high-stakes settings such as personalized medicine. We first introduce the recourse bandit problem, where a decision-maker must select both a treatment action and a feasible, minimal modification to mutable patient features. To address this problem, we develop the Generalized Linear Recourse Bandit (GLRB) algorithm. Building on this foundation, we propose LIBRA, a Language Model-Informed Bandit Recourse Algorithm that strategically combines domain knowledge from LLMs with the statistical rigor of bandit learning. LIBRA offers three key guarantees: (i) a warm-start guarantee, showing that LIBRA significantly reduces initial regret when LLM recommendations are near-optimal; (ii) an LLM-effort guarantee, proving that the algorithm consults the LLM only $O(\log^2 T)$ times, where $T$ is the time horizon, ensuring long-term autonomy; and (iii) a robustness guarantee, showing that LIBRA never performs worse than a pure bandit algorithm even when the LLM is unreliable. We further establish matching lower bounds that characterize the fundamental difficulty of the recourse bandit problem and demonstrate the near-optimality of our algorithms. Experiments on synthetic environments and a real hypertension-management case study confirm that GLRB and LIBRA improve regret, treatment quality, and sample efficiency compared with standard contextual bandits and LLM-only benchmarks. Our results highlight the promise of recourse-aware, LLM-assisted bandit algorithms for trustworthy LLM-bandits collaboration in personalized high-stakes decision-making.

0 Citations

0 Influential

5 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!