2605.01954v1 May 03, 2026 cs.AI

Moira: 언어 기반 계층적 강화 학습을 이용한 페어 트레이딩

Moira: Language-driven Hierarchical Reinforcement Learning for Pair Trading

Sophia Ananiadou

Citations: 52

h-index: 3

Xueqing Peng

Citations: 656

h-index: 13

Lingfei Qian

Citations: 359

h-index: 10

Jimin Huang

Citations: 249

h-index: 8

Polydoros Giannouris

Citations: 54

h-index: 4

Yuechen Jiang

Citations: 497

h-index: 7

Yuyan Wang

Citations: 19

h-index: 2

Guojun Xiong

Citations: 479

h-index: 8

많은 순차적 의사 결정 문제는 계층적 구조를 가지며, 여기서 상위 레벨의 의미적 선택이 하위 레벨의 행동을 제한하고, 피드백은 지연되고 모호합니다. 이러한 환경에서 학습은 신용 할당 문제로 인해 어려움을 겪는데, 이는 잘못된 추상화, 최적 이하의 실행, 또는 이들의 상호 작용으로 인해 성능 저하가 발생할 수 있기 때문입니다. 우리는 페어 트레이딩이라는 도메인을 통해 이러한 문제를 연구합니다. 페어 트레이딩은 자산 쌍 선택을 위한 장기적인 의미적 추론과 부분 관측 환경에서의 단기적인 실행을 자연스럽게 결합합니다. 우리는 페어 트레이딩을 계층적 강화 학습 문제로 정의하고, 상위 및 하위 레벨 정책 모두가 대규모 언어 모델(LLM)에 의해 파라미터화되고 프롬프트 업데이트를 통해서만 최적화되는 언어 기반 최적화 프레임워크를 제안합니다. 우리의 접근 방식은 사전 훈련된 LLM을 계층적 정책으로 활용하고, 궤적 및 에피소드 수준의 텍스트 피드백을 사용하여 기울기 기반 미세 조정 없이 추상화 및 실행을 조정합니다. 추상화 선택과 실행을 명시적으로 분리함으로써, 프레임워크는 계층적 레벨 간의 비정상성을 줄이고, 지연된 피드백 하에서 표적 적응을 가능하게 합니다. 실제 시장 데이터를 사용한 실험 결과, 기존 방법 및 LLM 기반 방법과 비교하여 일관된 성능 향상을 보여주며, 언어 기반 계층적 강화 학습의 효과를 입증합니다.

Original Abstract

Many sequential decision-making problems exhibit hierarchical structure, where high-level semantic choices constrain downstream actions and feedback is delayed and ambiguous. Learning in such settings is challenging due to credit assignment: performance degradation may arise from flawed abstractions, suboptimal execution, or their interaction. We study this challenge through pair trading, a domain that naturally combines long-horizon semantic reasoning for asset pair selection with short-horizon execution under partial observability. We formulate pair trading as a hierarchical reinforcement learning problem and propose a language-driven optimization framework in which both high-level and low-level policies are parameterized by large language models (LLMs) and optimized exclusively through prompt updates. Our approach leverages pretrained LLMs as hierarchical policies and uses trajectory- and episode-level textual feedback to adapt abstractions and execution without gradient-based fine-tuning. By explicitly separating abstraction selection from execution, the framework reduces non-stationarity across hierarchical levels and enables targeted adaptation under delayed feedback. Experiments on real-world market data show consistent improvements over traditional and LLM-based baselines, demonstrating the effectiveness of language-driven hierarchical reinforcement learning.

1 Citations

0 Influential

6.5 Altmetric

33.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!