2602.11908v2 Feb 12, 2026 cs.AI

LLM은 언제 덜 구체적이어야 하는가? 신뢰할 수 있는 장문 텍스트 생성을 위한 선택적 추상화

When Should LLMs Be Less Specific? Selective Abstraction for Reliable Long-Form Text Generation

Ran El-Yaniv

Citations: 275

h-index: 8

Ido Galil

Technion

Citations: 203

h-index: 7

Shani Goren

Citations: 13

h-index: 2

LLM은 널리 사용되고 있지만, 사용자 신뢰를 저하시키고 고위험 환경에서의 도입을 제한하는 사실적 오류에 여전히 취약하다. 이러한 위험을 완화하는 한 가지 접근법은 신뢰도가 낮을 때 생성을 보류하는 불확실성 추정 메커니즘을 모델에 적용하는 것이다. 그러나 이러한 이분법적인 '모 아니면 도(all-or-nothing)' 방식은 장문 생성 환경에서 지나치게 제한적이며, 종종 귀중한 정보를 버리게 만든다. 우리는 LLM이 불확실한 내용의 세부 사항을 선택적으로 축소하여 구체성 대신 신뢰성을 확보할 수 있도록 하는 프레임워크인 선택적 추상화(Selective Abstraction, SA)를 소개한다. 우리는 먼저 선택적 위험(selective risk)과 커버리지(coverage)의 관점에서 SA를 공식화한다. 그런 다음 응답을 원자적 주장(각각 단일 사실을 표현하는 짧고 독립적인 문장)으로 분해하고 불확실한 원자를 신뢰도가 높고 덜 구체적인 추상화로 대체하는 주장 수준(claim-level)의 적용 모델인 원자 단위 선택적 추상화(Atom-wise Selective Abstraction)를 제안한다. 이 프레임워크를 평가하기 위해, 위험성을 사실적 정확성으로 정의하고 보존된 정보에 대한 정보 이론적 척도를 사용하여 커버리지를 측정하는 개방형(open-ended) 텍스트 생성을 위한 새로운 종단간(end-to-end) 파이프라인을 개발한다. FactScore 및 LongFact-Objects 벤치마크에서 6개의 오픈 소스 모델을 평가한 결과, 원자 단위 SA는 기존 베이스라인을 일관되게 능가하여 단순 주장 제거(claim removal) 방식에 비해 위험-커버리지 곡선 하면적(AURC)을 최대 27.73% 개선했다. 이는 구체성을 낮추는 것이 원래 의미의 대부분을 보존하면서도 정확성과 신뢰성을 향상시킬 수 있음을 보여준다.

Original Abstract

LLMs are widely used, yet they remain prone to factual errors that erode user trust and limit adoption in high-risk settings. One approach to mitigate this risk is to equip models with uncertainty estimation mechanisms that abstain when confidence is low. However, this binary "all-or-nothing" approach is excessively restrictive in long-form settings, often discarding valuable information. We introduce Selective Abstraction (SA), a framework that enables LLMs to trade specificity for reliability by selectively reducing the detail of uncertain content. We first formalize SA through the lenses of selective risk and coverage. We then propose Atom-wise Selective Abstraction, a claim-level instantiation that decomposes responses into atomic claims (short, self-contained statements each expressing a single fact) and replaces uncertain atoms with higher confidence, less specific abstractions. To evaluate this framework, we develop a novel end-to-end pipeline for open-ended generation that instantiates risk as factual correctness and measures coverage using an information-theoretic measure of retained information. Across six open-source models on the FactScore and LongFact-Objects benchmarks, atom-wise SA consistently outperforms existing baselines, improving the area under the risk-coverage curve (AURC) by up to 27.73% over claim removal, demonstrating that reducing specificity can boost accuracy and reliability while preserving most of their original meaning.

1 Citations

0 Influential

4 Altmetric

21.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!