2602.24287v1 Feb 27, 2026 cs.CL

대규모 언어 모델은 스스로 생성한 답변으로부터 이점을 얻을 수 있는가?

Do LLMs Benefit From Their Own Words?

Leshem Choshen

Citations: 892

h-index: 14

Jenny Y. Huang

Citations: 16

h-index: 2

R. Astudillo

Citations: 50

h-index: 4

Tamara Broderick

Citations: 8

h-index: 2

Jacob Andreas

Citations: 121

h-index: 4

대규모 언어 모델과의 다중 턴 상호 작용에서, 일반적으로 어시스턴트의 이전 답변이 대화 기록에 유지됩니다. 본 연구에서는 이 설계 방식을 재검토하여, 대규모 언어 모델이 자신의 이전 답변을 참조하는 것이 실제로 도움이 되는지 질문합니다. 실제 다중 턴 대화를 사용하여, 세 개의 오픈 소스 추론 모델과 최첨단 모델에 대해 표준 (전체 컨텍스트) 프롬프트 방식과 모든 이전 어시스턴트 답변을 제외하는 사용자 턴 전용 프롬프트 방식을 비교했습니다. 놀랍게도, 이전 어시스턴트 답변을 제거해도 상당수의 턴에서 응답 품질에 영향을 미치지 않는다는 것을 발견했습니다. 어시스턴트 측의 기록을 생략하면 누적 컨텍스트 길이를 최대 10배까지 줄일 수 있습니다. 이러한 결과를 설명하기 위해, 다중 턴 대화에서 상당한 비율 (36.4%)이 독립적인 프롬프트를 구성하며, 많은 후속 프롬프트가 현재 사용자 턴과 이전 사용자 턴만으로 답변할 수 있을 만큼 충분한 정보를 제공한다는 것을 확인했습니다. 사용자 턴 전용 프롬프트가 전체 컨텍스트보다 훨씬 뛰어난 성능을 보이는 경우를 분석한 결과, 모델이 이전 답변에 과도하게 의존하여 오류, 환각 또는 스타일적 결함을 유발하는 '컨텍스트 오염' 현상이 발생하는 경우가 있음을 확인했습니다. 이러한 발견에 따라, 어시스턴트 측의 컨텍스트를 선택적으로 생략하는 컨텍스트 필터링 방식을 설계했습니다. 우리의 연구 결과는, 어시스턴트 기록을 선택적으로 생략하면 응답 품질을 향상시키면서 메모리 소비를 줄일 수 있음을 시사합니다.

Original Abstract

Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether large language models benefit from conditioning on their own prior responses. Using in-the-wild, multi-turn conversations, we compare standard (full-context) prompting with a user-turn-only prompting approach that omits all previous assistant responses, across three open reasoning models and one state-of-the-art model. To our surprise, we find that removing prior assistant responses does not affect response quality on a large fraction of turns. Omitting assistant-side history can reduce cumulative context lengths by up to 10x. To explain this result, we find that multi-turn conversations consist of a substantial proportion (36.4%) of self-contained prompts, and that many follow-up prompts provide sufficient instruction to be answered using only the current user turn and prior user turns. When analyzing cases where user-turn-only prompting substantially outperforms full context, we identify instances of context pollution, in which models over-condition on their previous responses, introducing errors, hallucinations, or stylistic artifacts that propagate across turns. Motivated by these findings, we design a context-filtering approach that selectively omits assistant-side context. Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.

2 Citations

0 Influential

7 Altmetric

37.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!