2601.23229v1 Jan 30, 2026 cs.AI

L∞ 강건 MDP의 정책 반복 알고리즘의 강한 다항 시간 복잡도

Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs

Carlo Pagano

Citations: 57

h-index: 4

Ali Asadi

Citations: 7

h-index: 1

Krishnendu Chatterjee

Citations: 5

h-index: 1

E. Goharshady

Citations: 163

h-index: 7

Mehrdad Karrabi

Citations: 44

h-index: 4

Alipasha Montaseri

Citations: 5

h-index: 1

마르코프 결정 과정(MDP)은 순차적 의사 결정의 기본적인 모델입니다. 강건 MDP(RMDP)는 변환 확률의 불확실성을 허용하고, 그 불확실성의 최악의 경우에 대해 최적화하는 방식으로 이 프레임워크를 확장합니다. 특히, (s, a)-직사각형 RMDP는 L∞ 불확실성 집합을 가지며, 이는 고전적인 MDP와 턴 기반 확률 게임을 포괄하는 기본적인 모델입니다. 본 연구에서는 할인된 보상을 갖는 이 모델을 고려합니다. 이러한 최적화 모델에 대한 다항 시간 및 강한 다항 시간 알고리즘의 존재 여부는 매우 중요한 문제입니다. MDP의 경우, 선형 계획법은 임의의 할인 계수에 대해 다항 시간 알고리즘을 제공하며, Ye의 선구적인 연구는 고정된 할인 계수에 대해 강한 다항 시간을 확립했습니다. 이러한 결과를 RMDP로 일반화하는 것은 중요한 미해결 문제로 남아있었습니다. 본 연구에서는 (s, a)-직사각형 L∞ RMDP에서, 상수(고정) 할인 계수를 갖는 강건 정책 반복 알고리즘이 강한 다항 시간으로 실행됨을 보여줌으로써, 중요한 알고리즘 문제를 해결합니다.

Original Abstract

Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly--polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show that a robust policy iteration algorithm runs in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant (fixed) discount factor, resolving an important algorithmic question.

4 Citations

0 Influential

3.5 Altmetric

21.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!