2604.22565v1 Apr 24, 2026 cs.CL

정형화된 LLM을 위한 증거 강조 학습

Learning Evidence Highlighting for Frozen LLMs

Yufei Li

Citations: 11

h-index: 2

Mingfu Liang

Citations: 26

h-index: 2

Xiaohan Wei

Citations: 31

h-index: 2

Chongling Sun

Citations: 40

h-index: 2

Sandeep Pandey

Citations: 1,698

h-index: 21

Luke Simon

Citations: 24

h-index: 3

Shaoang Li

Citations: 3

h-index: 1

Yunchen Pu

Citations: 227

h-index: 1

Fei Tian

Citations: 282

h-index: 7

F. Shyu

Citations: 53

h-index: 4

Xi Liu

Citations: 75

h-index: 3

Jian Li

Citations: 85

h-index: 3

Yanhang Shi

Citations: 29

h-index: 3

대규모 언어 모델(LLM)은 추론 능력이 뛰어나지만, 종종 긴 텍스트 내에 숨겨진 중요한 증거를 놓치는 경우가 있습니다. 본 연구에서는 정형화된 LLM을 위한 증거 강조 프레임워크인 HiLight를 소개합니다. HiLight는 증거 선택과 추론을 분리하며, 입력 텍스트를 압축하거나 재작성하지 않고, 가벼운 강조 액터(Emphasis Actor)를 훈련하여 원본 텍스트 내의 중요한 부분에 최소한의 강조 태그를 삽입합니다. 이렇게 강조된 텍스트를 기반으로 정형화된 추론 모듈(Solver)이 이후의 추론 작업을 수행합니다. 우리는 강조 작업을 약하게 감독되는 의사 결정 문제로 정의하고, 액터를 강화 학습을 통해 최적화합니다. 이때 액터는 추론 모듈의 작업 보상만을 사용하여 학습하며, 별도의 증거 라벨이 필요 없으며, 추론 모듈에 대한 접근이나 수정 또한 필요하지 않습니다. 실험 결과, HiLight는 순차적 추천 및 긴 텍스트 기반 질의응답 작업에서 기존의 프롬프트 기반 방법 및 자동 프롬프트 최적화 방법보다 일관되게 성능이 향상되는 것을 확인했습니다. 학습된 강조 정책은 API 기반 추론 모듈을 포함한 다양한 크기의 추론 모듈로 쉽게 이전(transfer)될 수 있으며, 이는 액터가 특정 모델에 과적합되는 것이 아니라, 재사용 가능한 증거 구조를 학습한다는 것을 시사합니다.

Original Abstract

Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or rewriting the input, which can discard or distort evidence, by training a lightweight Emphasis Actor to insert minimal highlight tags around pivotal spans in the unaltered context. A frozen Solver then performs downstream reasoning on the emphasized input. We cast highlighting as a weakly supervised decision-making problem and optimize the Actor with reinforcement learning using only the Solver's task reward, requiring no evidence labels and no access to or modification of the Solver. Across sequential recommendation and long-context question answering, HiLight consistently improves performance over strong prompt-based and automated prompt-optimization baselines. The learned emphasis policy transfers zero-shot to both smaller and larger unseen Solver families, including an API-based Solver, suggesting that the Actor captures genuine, reusable evidence structure rather than overfitting to a single backbone.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!