2605.14636v1 May 14, 2026 cs.AI

언제 무엇을 모를지 가르치는 것: 사전 추론을 위한 시간적 비판 학습

Teaching Large Language Models When Not to Know: Learning Temporal Critique for Ex-Ante Reasoning

Zheyuan Liu

Citations: 174

h-index: 7

Jiancan Wu

Citations: 3,690

h-index: 25

Chenlu Ding

Citations: 38

h-index: 3

Yanchen Luo

Citations: 540

h-index: 7

Yancheng Yuan

Citations: 442

h-index: 7

Xiang Wang

Citations: 5

h-index: 1

대규모 언어 모델(LLM)은 종종 시간적 제약 조건 하에서 추론에 실패합니다. 즉, 특정 시점의 관점에서 답변하도록 요청받을 때, 해당 시점에는 존재하지 않던 지식을 활용하는 경우가 많습니다. 본 연구에서는 이러한 실패를 사전 추론의 관점에서 분석합니다. 사전 추론이란 모델이 특정 시점 이전에 알려진 정보만을 사용하여 추론해야 하는 상황을 의미합니다. 프롬프트 수준의 개입에 대한 체계적인 분석을 통해, 시간적 정보 유출은 절단 시점의 설정 방식과 지시 사항의 위치에 매우 민감하게 반응한다는 것을 확인했습니다. 명시적인 절단 시점 지시는 암묵적인 역사적 맥락 설정보다 더 효과적이며, 접두사 제약은 접미사 제약보다 정보 유출을 더 효과적으로 줄입니다. 이러한 결과는 프롬프트를 통해 모델을 특정 시간적 틀로 유도할 수 있지만, 모델이 응답이 시간적으로 허용 가능한지 여부를 판단하는 능력을 부여하는 것은 아니라는 점을 시사합니다. 또한, 지도 학습만으로는 충분하지 않다는 점을 주장합니다. 왜냐하면, 응답의 정확성은 응답 자체의 고유한 속성이 아니라, 응답과 절단 시점 사이의 관계이기 때문입니다. 이러한 문제를 해결하기 위해, 본 연구에서는 절단 시점에 대한 인식을 갖춘 시간적 검증 능력을 학습하도록 모델을 훈련하는 Temporal Critique Fine-Tuning (TCFT) 프레임워크를 제안합니다. TCFT는 주어진 질문, 절단 시점, 그리고 후보 응답에 대해, 모델이 절단 시점 이후의 정보 유출을 식별하고, 시간적 경계 위반을 설명하며, 시간적 타당성을 판단하도록 훈련합니다. Qwen2.5-7B-Instruct 및 Qwen2.5-14B-Instruct 모델을 사용한 실험 결과, TCFT는 프롬프트 및 지도 학습 기반 모델보다 일관되게 우수한 성능을 보였으며, 평균 정보 유출량을 각각 41.89% 및 37.79% 포인트 감소시켰습니다.

Original Abstract

Large language models (LLMs) often fail to reason under temporal cutoffs: when prompted to answer from the standpoint of an earlier time, they exploit knowledge that became available only later. We study this failure through the lens of ex-ante reasoning, where a model must rely exclusively on information knowable before a cutoff. Through a systematic analysis of prompt-level interventions, we find that temporal leakage is highly sensitive to cutoff formulation and instruction placement: explicit cutoff statements outperform implicit historical framings, and prefix constraints reduce leakage more effectively than suffix constraints. These findings indicate that prompting can steer models into a temporal frame, but does not endow them with the ability to verify whether a response is temporally admissible. We further argue that supervised fine-tuning is insufficient, since ex-ante correctness is not an intrinsic property of an answer, but a relation between the answer and the cutoff. To address this gap, we propose TCFT, a Temporal Critique Fine-Tuning framework that trains models to acquire cutoff-aware temporal verification. Given a query, a cutoff, and a candidate response, TCFT teaches the model to identify post-cutoff leakage, explain temporal boundary violations, and judge temporal admissibility. Experiments with Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct show that TCFT consistently outperforms prompting and SFT baselines, reducing average leakage by 41.89 and 37.79 percentage points, respectively.

0 Citations

0 Influential

12.5 Altmetric

62.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!