2604.16042v2 Apr 17, 2026 cs.CL

대규모 언어 모델의 내재적 해석 가능성을 향하여: 설계 원칙 및 아키텍처에 대한 개관

Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures

Liangming Pan

Citations: 16

h-index: 1

Yuan Zhou

Citations: 8

h-index: 2

Yutong Gao

Citations: 49

h-index: 4

Qinglin Meng

Citations: 14

h-index: 2

대규모 언어 모델(LLM)은 다양한 자연어 처리 작업에서 뛰어난 성능을 보이지만, 그 내부 메커니즘의 불투명성은 신뢰성을 저해하고 안전한 배포를 어렵게 만듭니다. 기존의 설명 가능한 인공지능(XAI) 연구는 주로 훈련된 모델을 외부 근사를 통해 해석하는 사후 설명 방법에 초점을 맞추고 있습니다. 반면, 투명성을 모델 아키텍처와 계산 과정에 직접 통합하는 내재적 해석 가능성은 최근 유망한 대안으로 부상했습니다. 본 논문에서는 LLM을 위한 최근의 내재적 해석 가능성 연구 동향을 체계적으로 검토하고, 기존 접근 방식을 기능적 투명성, 개념 정렬, 표현 분해 가능성, 명시적 모듈화, 잠재적 희소성 유도라는 다섯 가지 설계 패러다임으로 분류합니다. 또한, 이 새롭게 떠오르는 분야의 해결해야 할 과제와 향후 연구 방향을 논의합니다. 관련 논문 목록은 다음 주소에서 확인할 수 있습니다: https://github.com/PKU-PILLAR-Group/Survey-Intrinsic-Interpretability-of-LLMs.

Original Abstract

While Large Language Models (LLMs) have achieved strong performance across many NLP tasks, their opaque internal mechanisms hinder trustworthiness and safe deployment. Existing surveys in explainable AI largely focus on post-hoc explanation methods that interpret trained models through external approximations. In contrast, intrinsic interpretability, which builds transparency directly into model architectures and computations, has recently emerged as a promising alternative. This paper presents a systematic review of the recent advances in intrinsic interpretability for LLMs, categorizing existing approaches into five design paradigms: functional transparency, concept alignment, representational decomposability, explicit modularization, and latent sparsity induction. We further discuss open challenges and outline future research directions in this emerging field. The paper list is available at: https://github.com/PKU-PILLAR-Group/Survey-Intrinsic-Interpretability-of-LLMs.

0 Citations

0 Influential

28.931471805599 Altmetric

144.7 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!