2604.01404v1 Apr 01, 2026 cs.CL

시뮬레이션 속의 친구와 할머니: 언어 모델에서 개체 셀의 위치 추적

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

Mor Geva

Google Research

Citations: 10,398

h-index: 33

Itay Yona

Citations: 3,488

h-index: 9

Daniel Barzilay

Citations: 1

h-index: 1

M. Karasik

Citations: 402

h-index: 9

언어 모델은 다양한 개체 중심 사실 질문에 답변할 수 있지만, 이러한 과정에 관련된 내부 메커니즘은 여전히 명확하지 않습니다. 본 연구에서는 여러 언어 모델에 걸쳐 이 질문을 연구합니다. 각 개체에 대한 템플릿 프롬프트를 사용하여 개체 선택적 MLP 뉴런을 위치 추적하고, PopQA 기반 질의응답 예제를 사용하여 인과적 개입을 통해 이를 검증합니다. PopQA에서 추출된 200개의 개체 세트에 대해, 위치 추적된 뉴런은 주로 초기 레이어에 집중되어 있습니다. 음성적 제거는 개체별 기억 상실을 유발하는 반면, 자리 표시자 토큰에 대한 제어된 주입은 평균 개체 및 잘못된 셀을 제어 그룹으로 사용하여 답변 검색 성능을 향상시킵니다. 많은 개체의 경우, 컨텍스트 초기화 후 단일 위치 추적된 뉴런을 활성화하는 것만으로도 개체와 일관된 예측을 복구하는 데 충분하며, 이는 깊이에 따른 점진적인 풍부화보다는 압축된 개체 검색을 시사합니다. 별칭, 약어, 오타 및 다국어 표현에 대한 견고성은 표준화 해석을 뒷받침합니다. 이러한 효과는 강력하지만 보편적이지 않습니다. 모든 개체가 안정적인 단일 뉴런으로 표현되는 것은 아니며, 인기도가 높은 개체에 대해 더 높은 적용률을 보입니다. 전반적으로, 본 연구 결과는 개체 기반 사실 정보 처리 행동을 분석하고 조작하기 위한 희소하고 인과적으로 작용 가능한 접근 지점을 식별합니다.

Original Abstract

Language models can answer many entity-centric factual questions, but it remains unclear which internal mechanisms are involved in this process. We study this question across multiple language models. We localize entity-selective MLP neurons using templated prompts about each entity, and then validate them with causal interventions on PopQA-based QA examples. On a curated set of 200 entities drawn from PopQA, localized neurons concentrate in early layers. Negative ablation produces entity-specific amnesia, while controlled injection at a placeholder token improves answer retrieval relative to mean-entity and wrong-cell controls. For many entities, activating a single localized neuron is sufficient to recover entity-consistent predictions once the context is initialized, consistent with compact entity retrieval rather than purely gradual enrichment across depth. Robustness to aliases, acronyms, misspellings, and multilingual forms supports a canonicalization interpretation. The effect is strong but not universal: not every entity admits a reliable single-neuron handle, and coverage is higher for popular entities. Overall, these results identify sparse, causally actionable access points for analyzing and modulating entity-conditioned factual behavior.

0 Citations

0 Influential

16.5 Altmetric

82.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!