2605.02572v1 May 04, 2026 cs.AI

장기 과제 해결을 위한 대규모 언어 모델 훈련 연구: 지평 거리(Horizon Length)에 대한 실증적 연구

On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length

Jinyoung Yeo

Citations: 776

h-index: 14

Liang Wang

Citations: 81

h-index: 4

Nan Yang

Citations: 1,865

h-index: 12

Xingxing Zhang

Citations: 663

h-index: 9

Taeyoon Kwon

Yonsei University

Citations: 364

h-index: 9

Junhee Cho

Citations: 28

h-index: 2

Beong-woo Kwak

Citations: 218

h-index: 8

Furu Wei

Citations: 376

h-index: 6

Sunghwan Kim

Citations: 14

h-index: 2

대규모 언어 모델(LLM)은 환경과의 상호 작용을 통해 과제를 해결하는 인터랙티브 에이전트로서의 가능성을 보여왔습니다. 기존 연구는 주로 시스템 수준의 최적화 또는 알고리즘 개선에 초점을 맞추었지만, 과제 지평 거리(task horizon length)가 훈련 과정에 미치는 영향은 아직 제대로 이해되지 않았습니다. 본 연구에서는 통제된 과제 설계를 통해 지평 거리에 대한 체계적인 실증 연구를 수행했습니다. 구체적으로, 에이전트가 동일한 의사 결정 규칙과 추론 구조를 가지지만, 성공적인 완료에 필요한 행동 시퀀스의 길이가 다른 통제된 과제를 구성했습니다. 그 결과, 지평 거리를 늘리는 것만으로는 훈련 병목 현상을 야기하며, 탐색의 어려움과 보상 할당 문제로 인해 심각한 훈련 불안정을 초래한다는 것을 확인했습니다. 지평 거리를 줄이는 것이 이러한 한계를 극복하는 핵심 원칙임을 보여주었으며, 이는 훈련을 안정화하고 장기 과제에서 더 나은 성능을 달성하는 데 도움이 됩니다. 또한, 지평 거리를 줄이는 것이 지평 거리 전반에 걸친 일반화 능력을 향상시킨다는 것을 발견했습니다. 즉, 감소된 지평 거리로 훈련된 모델은 추론 시 더 긴 지평 거리의 변형에 대해 더 효과적으로 일반화하며, 이를 우리는 '지평 일반화(horizon generalization)'라고 명명했습니다.

Original Abstract

Large language models (LLMs) have shown promise as interactive agents that solve tasks through extended sequences of environment interactions. While prior work has primarily focused on system-level optimizations or algorithmic improvements, the role of task horizon length in shaping training dynamics remains poorly understood. In this work, we present a systematic empirical study that examines horizon length through controlled task constructions. Specifically, we construct controlled tasks in which agents face identical decision rules and reasoning structures, but differ only in the length of action sequences required for successful completion. Our results reveal that increasing horizon length alone constitutes a training bottleneck, inducing severe training instability driven by exploration difficulties and credit assignment challenges. We demonstrate that horizon reduction is a key principle to address this limitation, stabilizing training and achieving better performance in long-horizon tasks. Moreover, we find that horizon reduction is related to stronger generalization across horizon lengths: models trained under reduced horizons generalize more effectively to longer-horizon variants at inference time, a phenomenon we refer to as horizon generalization.

0 Citations

0 Influential

7 Altmetric

35.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!