2603.07482v1 Mar 08, 2026 cs.LG

아키텍처 스트림 독립성을 통한 설계 기반의 해석 가능한 트랜스포머

Interpretable-by-Design Transformers via Architectural Stream Independence

Citations: 128

h-index: 2

Citations: 6

h-index: 2

트랜스포머는 강력한 성능을 달성하지만, 내부 의사 결정 과정은 여전히 불투명합니다. 본 연구에서는 아키텍처 제약 조건이 아키텍처 스트림 독립성을 통해 해석 가능성을 어떻게 설계적으로 강화할 수 있는지 조사합니다. 아키텍처 스트림 독립성은 토큰 스트림(기호 구조를 담고 있음)과 문맥적 의미를 분리된 스트림으로 유지하며, 이러한 스트림은 처리 과정 전체에서 독립적으로 관찰될 수 있도록 설계하고, 통합은 출력 단계에서 지연시킵니다. 본 연구는 Late Fusion Architecture (LFA)를 통해 이러한 원리를 검증합니다. LFA는 최종 레이어에서 해석 가능한 기호적 특징을 보여주는 반면, 일반적인 트랜스포머는 여섯 개의 레이어 중 세 번째 레이어에서 이러한 특징이 사라지는 것을 확인했습니다. 이러한 효과를 정량화하기 위해 토큰-위치 의존성 점수 (PDS)를 도입했으며, 각각 $PDS_{max}$는 0.276과 0.058입니다. 중요한 점은, 개입 실험을 통해 LFA의 최근 정보 처리 모듈을 억제하면 의미론적 손상이 최소화되는 반면 (Cohen's d = -0.158), 기존 모델에서는 심각한 연관성이 발생하는 것을 확인했습니다 (d = -0.672). LFA는 아키텍처 제약 조건이 근본적인 학습 메커니즘을 개선하며, 평균적으로 42%의 안정성을 보이는 반면, 기존 모델은 각각 19%와 11%의 안정성을 보였습니다. LFA의 가장 좋은 쌍 (12개의 헤드 중 6개가 위치에 불변)에서는 50%의 안정성을 보였지만, 과도하게 제약된 경우 0%로 완전히 붕괴되는 경우도 있었습니다. 아키텍처적 독립성은 조기 연관성을 방지하여 모델이 위치 기반 휴리스틱보다는 의미론적 이해에 집중하도록 유도하며, 이를 통해 해석 가능성을 사후 분석이 아닌 아키텍처 설계 기준으로 확립할 수 있습니다.

Original Abstract

While transformers achieve strong performance, their internal decision-making processes remain opaque. We investigate whether architectural constraints can enforce interpretability by design through architectural stream independence: maintaining a token stream (carrying symbolic structure) and contextual semantics in separated streams that remain independently observable throughout processing, with integration delayed until output. We validate this principle through the Late Fusion Architecture (LFA), which demonstrates interpretable symbolic heads through all the final layers, while standard transformers show dissolution by the third of six layers; we quantify this effect by introducing the Token-Position Dependence Score (PDS), with $PDS_{max}$ = 0.276 and 0.058, respectively. Crucially, intervention experiments demonstrate functional modularity: suppressing LFA's recency heads causes minimal semantic damage (Cohen's d = -0.158) versus catastrophic entanglement in baselines (d = -0.672). LFA demonstrates that architectural constraints improve underlying learning mechanisms, averaging 42% stability versus 19% and 11% for baseline comparisons, with extremes from 50% on LFA's best pairs (6 of 12 heads position-invariant) down to 0% complete collapse in over-constrained cases. By preventing premature entanglement, architectural independence steers models toward semantic understanding over positional heuristics, establishing interpretability as an architectural design criterion enforceable through structural constraints rather than post-hoc analysis.

0 Citations

0 Influential

1 Altmetric

5.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!