2605.00369v1 May 01, 2026 cs.LG

AlphaInventory: 대규모 언어 모델을 활용한 인벤토리 정책 진화 - 배포 보장 기능 포함

AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees

Bo Jiang

Citations: 9

h-index: 1

Benyou Wang

Citations: 656

h-index: 7

Ruoqing Jiang

Citations: 120

h-index: 1

Jianghao Lin

Citations: 43

h-index: 2

Chenyu Huang

Citations: 120

h-index: 2

Zhengyang Tang

Citations: 1

h-index: 1

Lai Wei

Citations: 14

h-index: 2

본 연구에서는 대규모 언어 모델(LLM)을 활용하여 온라인 환경에서 변동하는 재고 관리 정책을 어떻게 발전시킬 수 있는지 탐구합니다. 최근 LLM 기반 진화형 검색 기술, 특히 AlphaEvolve는 수학적 발견과 같이 정적이고 구조화된 문제에서 뛰어난 성능을 보이지만, 온라인 환경의 동적인 재고 관리에는 직접적으로 적합하지 않습니다. 이에, 본 연구에서는 신뢰 구간 기반 인증을 기반으로 하는 엔드 투 엔드 재고 관리 정책 진화 및 추론 프레임워크인 AlphaInventory를 제안합니다. 이 프레임워크는 강화 학습을 사용하여 LLM을 훈련하고, 수요 데이터뿐만 아니라 수요 외의 수치 및 텍스트 특징을 통합하며, 향후 기간에 배포될 수 있는 통계적 안전성을 보장하는 투명한(white-box) 재고 관리 정책을 생성합니다. 또한, 훈련, 추론 및 배포를 연결하는 통합된 이론적 인터페이스를 도입하여, AlphaInventory가 통계적으로 안전하고 개선된 정책을 진화시킬 확률을 분석하고, 최적의 안전 수준을 기준으로 하는 배포 격차를 정량화합니다. 합성 데이터와 실제 소매 데이터 모두에서 AlphaInventory는 기존의 재고 관리 정책 및 딥러닝 기반 방법보다 우수한 성능을 보이며, 표준적인 재고 관리 환경에서 기존의 벤치마크를 개선하는 새로운 정책을 진화시킵니다.

Original Abstract

We study how large language models can be used to evolve inventory policies in online, non-stationary environments. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance for static and highly structured problems such as mathematical discovery, but is not directly suited to online dynamic inventory settings. To this end, we propose AlphaInventory, an end-to-end inventory-policy evolution and inference framework grounded in confidence-interval-based certification. The framework trains a large language model using reinforcement learning, incorporates demand data as well as numerical and textual features beyond demand, and generates white-box inventory policy with statistical safety guarantees for deployment in future periods. We further introduce a unified theoretical interface that connects training, inference, and deployment. This allows us to characterize the probability that the AlphaInventory evolves a statistically safe and improved policy, and to quantify the deployment gap relative to the oracle-safe benchmark. Tested on both synthetic data and real-world retail data, AlphaInventory outperforms classical inventory policies and deep learning based methods. In canonical inventory settings, it evolves new policies that improve upon existing benchmarks.

1 Citations

0 Influential

3.5 Altmetric

18.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!