2601.00282v1 Jan 01, 2026 cs.CL

대규모 언어 모델은 여전히 자신을 설명할 수 있는가? 양자화가 자체 설명에 미치는 영향 연구

Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations

Nils Feldhus

BIFOLD, TU Berlin, German Research Center for Artificial Intelligence

Citations: 419

h-index: 11

Qianli Wang

Technische Universität Berlin

Citations: 85

h-index: 5

Pepa Atanasova

Citations: 2,230

h-index: 18

Fedor Splitt

Citations: 1

h-index: 1

Simon Ostermann

Citations: 555

h-index: 10

Sebastian Moller

Citations: 52

h-index: 4

Vera Schmitt

Citations: 168

h-index: 8

양자화는 대규모 언어 모델(LLM)의 추론 속도를 높이고 배포를 간소화하는 데 널리 사용되지만, 자체 설명(SE)에 미치는 영향은 아직 연구되지 않았습니다. 자체 설명은 LLM이 자신의 출력에 대한 근거를 제시하기 위해 생성하며, 모델 자체의 의사 결정 과정을 이해해야 하는 능력을 요구합니다. 이러한 능력은 양자화에 특히 민감할 수 있습니다. 자체 설명은 고위험 응용 분야에서 투명성을 확보하는 데 점점 더 중요해지고 있기 때문에, 양자화가 자체 설명의 품질과 신뢰성을 저하시키는지, 그리고 그 정도가 어느 정도인지 이해하는 것은 매우 중요합니다. 이러한 격차를 해결하기 위해, 우리는 세 가지 일반적인 양자화 기술을 사용하여 양자화된 LLM이 생성하는 두 가지 유형의 자체 설명을 조사했습니다. 자연어 설명(NLE)과 반사실 예제가 그 대상입니다. 우리의 연구 결과는 양자화가 일반적으로 자체 설명의 품질(최대 4.4% 감소)과 신뢰성(최대 2.38% 감소) 모두에 중간 정도의 영향을 미친다는 것을 보여줍니다. 사용자 연구 결과는 양자화가 자체 설명의 일관성과 신뢰성(최대 8.5% 감소)을 모두 감소시킨다는 것을 더욱 보여줍니다. 작은 모델과 비교했을 때, 큰 모델은 자체 설명 품질에 대한 양자화의 영향에 대한 회복력이 제한적이지만, 신뢰성은 더 잘 유지합니다. 또한, 어떤 양자화 기술도 작업 정확도, 자체 설명 품질 및 신뢰성 측면에서 일관되게 우수하지 않습니다. 양자화의 영향은 맥락에 따라 다르기 때문에, 특정 사용 사례에 대한 자체 설명 품질을 검증하는 것이 좋습니다. 특히, 자연어 설명은 양자화에 더 민감한 것으로 나타났습니다. 그럼에도 불구하고, 자체 설명의 품질과 신뢰성에 미치는 비교적 경미한 영향은 양자화의 모델 압축 기술로서의 효과를 훼손하지 않습니다.

Original Abstract

Quantization is widely used to accelerate inference and streamline the deployment of large language models (LLMs), yet its effects on self-explanations (SEs) remain unexplored. SEs, generated by LLMs to justify their own outputs, require reasoning about the model's own decision-making process, a capability that may exhibit particular sensitivity to quantization. As SEs are increasingly relied upon for transparency in high-stakes applications, understanding whether and to what extent quantization degrades SE quality and faithfulness is critical. To address this gap, we examine two types of SEs: natural language explanations (NLEs) and counterfactual examples, generated by LLMs quantized using three common techniques at distinct bit widths. Our findings indicate that quantization typically leads to moderate declines in both SE quality (up to 4.4\%) and faithfulness (up to 2.38\%). The user study further demonstrates that quantization diminishes both the coherence and trustworthiness of SEs (up to 8.5\%). Compared to smaller models, larger models show limited resilience to quantization in terms of SE quality but better maintain faithfulness. Moreover, no quantization technique consistently excels across task accuracy, SE quality, and faithfulness. Given that quantization's impact varies by context, we recommend validating SE quality for specific use cases, especially for NLEs, which show greater sensitivity. Nonetheless, the relatively minor deterioration in SE quality and faithfulness does not undermine quantization's effectiveness as a model compression technique.

0 Citations

0 Influential

9 Altmetric

45.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!