2604.07981v1 Apr 09, 2026 cs.CL

LLM의 장문 맥락 추론을 위한 분해적 관점

A Decomposition Perspective to Long-context Reasoning for LLMs

Shihan Dou

Citations: 4,276

h-index: 26

Pluto Zhou

Citations: 12

h-index: 2

Zhisong Zhang

Citations: 171

h-index: 3

Guoliang Zhao

Citations: 2

h-index: 1

Huaibing Xie

Citations: 211

h-index: 2

Yiting Liu

Citations: 4

h-index: 1

Nantao Zheng

Citations: 100

h-index: 2

Ya-Li Xiao

Citations: 23

h-index: 2

Shaolei Wang

Citations: 8

h-index: 2

Cheng Zhang

Citations: 31

h-index: 3

Lemao Liu

Citations: 44

h-index: 3

장문 맥락 추론은 복잡한 실세계 응용 분야에서 필수적이지만, 대규모 언어 모델(LLM)에게는 여전히 중요한 과제입니다. 장문 맥락 추론 분야의 빠른 발전에도 불구하고, 현재 연구는 종종 장문 맥락 추론 작업 자체의 내부적 복잡성을 간과합니다. 본 논문에서는 이러한 전체적인 관점을 넘어, 장문 맥락 추론을 기본적인 구성 요소 기술들의 집합으로 분해하고, 각 기술을 명시적으로 목표로 하는 다양한 가짜 데이터셋을 자동으로 생성합니다. 우리의 실험적 분석 결과는 이러한 구성 요소 기술 숙련도가 일반적인 장문 텍스트 추론 성능과 밀접하게 연관되어 있음을 확인합니다. 이러한 통찰력을 바탕으로, 우리는 이러한 가짜 데이터셋을 활용하여 강화 학습을 통해 모델의 구성 요소 기술을 향상시키고, 이를 통해 모델의 일반적인 장문 맥락 추론 능력을 향상시키고자 합니다. 여러 벤치마크를 대상으로 진행된 광범위한 실험 결과는 우리의 접근 방식이 효과적임을 보여줍니다. Loogle, Loong, LongBench-v2, BrowscompLong, Ruler-qa2, 및 MRCR 데이터셋에서 평균 7.7%의 성능 향상을 보였습니다 (46.3%에서 54.0%로 향상).

Original Abstract

Long-context reasoning is essential for complex real-world applications, yet remains a significant challenge for Large Language Models (LLMs). Despite the rapid evolution in long-context reasoning, current research often overlooks the internal complexity of the long-context reasoning task itself. In this paper, we move beyond this holistic view and decompose long-context reasoning into a set of fundamental atomic skills, and we then automatically synthesize a suite of pseudo datasets, each explicitly targeting a specific atomic skill. Our empirical analysis confirms that proficiency in these atomic skills is strongly correlated with general long-text reasoning performance. Building on this insight, we employ reinforcement learning on these pseudo datasets to sharpen the model's atomic skills, in the hope of boosting its general long-context reasoning ability. Extensive experiments across multiple benchmarks demonstrate the effectiveness of our approach: it outperforms a strong baseline by an average margin of 7.7\% (improving from 46.3\% to 54.0\%) across Loogle, Loong, LongBench-v2, BrowscompLong, Ruler-qa2, and MRCR.

0 Citations

0 Influential

13 Altmetric

65.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!