2604.23988v1 Apr 27, 2026 cs.LG

금융 시계열 자문 시스템을 위한 후향적 선호도 최적화

Hindsight Preference Optimization for Financial Time Series Advisory

Ya Cui

Citations: 43

h-index: 4

Guanghui Wang

Citations: 23

h-index: 3

Pei-Gen He

Citations: 21

h-index: 3

Wei Qiu

School of Computer Science and Engineering, Nanyang Technological University, Singapore

Citations: 498

h-index: 8

Ziyuan Li

Citations: 104

h-index: 4

Bing Zhu

Citations: 39

h-index: 4

Xing Zhang

Citations: 22

h-index: 3

Zhengwei Yu

Citations: 131

h-index: 6

Anqi Xin

Citations: 0

h-index: 0

Xusheng Wang

Citations: 7

h-index: 1

시계열 모델은 숫자 예측을 수행하지만, 의사결정자는 방향성 지표, 근거 설명, 실행 가능한 제안 및 위험 관리 기능을 갖춘 자문을 필요로 합니다. 이러한 예측 자문을 위한 언어 모델 훈련은 근본적인 어려움에 직면하는데, 이는 예측 시점에 알려지지 않은 결과에 품질이 의존하기 때문입니다. 본 연구에서는 강화 학습의 두 가지 개념, 즉 실행 중에 사용할 수 없는 정보를 활용하여 후향적으로 훈련 신호를 생성하고, 선호도 정렬을 결합하여 '후향적 선호도 최적화(Hindsight Preference Optimization)'를 제안합니다. 관찰된 결과는 LLM이 스칼라 지표로 포착할 수 없는 측면에서 후보 자문을 순위화하도록 하여, 인간 주석 없이 DPO(Direct Preference Optimization)를 위한 선호도 쌍을 생성합니다. 본 연구에서는 S&P 500 주식 시계열 데이터에 대한 Vision-Language-Model 기반 예측 자문에 이 방법을 적용하였으며, 40억 파라미터 모델이 2350억 파라미터 모델보다 정확도와 자문 품질 모두에서 더 우수한 성능을 보이는 것을 확인했습니다.

Original Abstract

Time series models predict numbers; decision-makers need advisory -- directional signals with reasoning, actionable suggestions, and risk management. Training language models for such predictive advisory faces a fundamental challenge: quality depends on outcomes unknown at prediction time. We bridge two ideas from reinforcement learning -- using information unavailable during execution to retrospectively generate training signal, and preference alignment -- and propose Hindsight Preference Optimization: observed outcomes let an LLM judge rank candidate advisories on dimensions that scalar metrics cannot capture, producing preference pairs for DPO without human annotation. We apply this to Vision-Language-Model-based predictive advisories on S&P 500 equity time series, demonstrated by a 4B model outperforming its 235B teacher on both accuracy and advisory quality.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!