2604.10911v1 Apr 13, 2026 cs.AI

EvoNash-MARL: 중장기 주식 배분 시스템을 위한 폐루프 다중 에이전트 강화 학습 프레임워크

EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation

Pengwei Li

Citations: 14,999

h-index: 8

Chongliu Jia

Citations: 11

h-index: 3

Youshuang Hu

Citations: 0

h-index: 0

Qiya Wang

Citations: 7

h-index: 1

Si Han

Citations: 23

h-index: 3

Yichuan Luo

Citations: 0

h-index: 0

Jie Ding

Citations: 145

h-index: 6

Yimiao Qian

Citations: 0

h-index: 0

중장기 주식 배분은 예측 구조의 불확실성, 비정상적인 시장 환경, 거래 비용, 거래 용량 제한, 그리고 극단적인 위험 제약 조건 적용 후 신호 저하 등의 문제로 인해 상당한 어려움을 겪습니다. 기존 접근 방식은 일반적으로 단일 예측 모델에 의존하거나, 예측과 배분 간의 느슨하게 결합된 파이프라인을 사용하며, 이는 데이터 분포 변화에 대한 강건성을 제한합니다. 본 연구에서는 강화 학습(RL), 다중 에이전트 정책 집합, 정책 공간 응답 오라클(PSRO) 스타일의 집계, 리그 최적 응답 훈련, 진화적 대체, 그리고 실행 환경을 고려한 체크포인트 선택을 통합된 워크포워드 루프 내에서 결합하여, 중장기 배분기의 강건성을 향상시킬 수 있는지에 대한 질문에 답하고자 합니다. 제안하는 EvoNash-MARL 프레임워크는 이러한 구성 요소를 실행 환경을 고려한 배분 루프 내에 통합하고, 방향성 예측과 위험 관리를 위한 계층적 정책 아키텍처, 비선형 신호 강화, 특징 품질 재가중, 그리고 제약 조건 기반 체크포인트 선택을 추가적으로 도입합니다. 120일 워크포워드 프로토콜 하에서, 개선된 v21 구성은 평균 초과 샤프 지수 0.7600과 강건성 점수 -0.0203을 달성하여, 내부 통제 기준에서 최상위를 기록했습니다. 또한, 2014년 1월 2일부터 2024년 1월 5일까지의 일일 외부 데이터셋을 사용하여 분석한 결과, 연간 수익률은 19.6%로, SPY의 11.7%보다 높은 성능을 보였습니다. 2026년 2월 10일까지 확장된 워크포워드 평가에서는 연간 수익률이 20.5%로, SPY의 13.5%보다 우수한 성능을 나타냈습니다. 본 프레임워크는 현실적인 제약 조건 하에서도 양호한 성능을 유지하며, 시장 간 일반화 능력을 보여줍니다. 그러나 화이트의 현실 점검(WRC) 및 SPA-lite 테스트에서 전반적인 통계적 유의성이 확인되었습니다. 따라서 본 연구 결과는 보편적으로 우수한 시장 예측 성능을 입증하기보다는, 중장기 훈련 및 선택 패러다임의 안정성을 뒷받침하는 증거로 제시됩니다.

Original Abstract

Medium-to-long-horizon stock allocation presents significant challenges due toveak predictive structures, non-stadonary market regimes, and the degradationf signals following the application of transaction costs, capacity limits, and tail-isk constraints. Conventional approaches commonly rely on a single predictor orloosely coupled prediction-to-allocation pipeline, limiting robustness underThis work addresses a targeted design question: whetherlistribution shift. 1coupling reinforcement learning (RL), multi-agent policy populations, Policy-Space Response Oracle (PSRO)-style aggregation, league best-response trainingevolutionary replacement, and execution-aware checkpoint selection within ainified walk-forward loop improves allocator robustness at medium to longhorizons. The proposed framework, EvoNash-MARL, integrates these componentswithin an execution-aware allocation loop and further introduces a layeredpolicy architecture comprising a direction head and a risk head, nonlinear signalenhancement, feature-quality reweighting, and constraint-aware checkpointselection. Under a 120-window walk-forward protocol, the resolved v21configuration achieves mean excess Sharpe 0.7600 and robust score -0.0203,anking first among internal controls; on aligned daily out-of-sample returnsrom 2014-01-02 to 2024-01-05, it delivers 19.6% annualized return versus 11.7% for SPY, and in an extended walk-forward evaluation through 2026-02-10 it delivers 20.5% rersus 13.5%. The framework maintains positive performance under realistictress constraints and exhibits structured cross-market generalization; however,lobal strong significance under White's Reality Check (WRC) and SPA-lite testingestablished. Therefore, the results are presented as evidence supporting asnotnore stable medium-to long-horizon training and selection paradigm, ratherhan as prooffof universally superior market-timing performance.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!