2605.30233v1 May 28, 2026 cs.CL

Do Language Models Track Entities Across State Changes?

Gabriel Franco
Gabriel Franco
Citations: 5
h-index: 2
Derry Wijaya
Derry Wijaya
Citations: 39
h-index: 4
Qiao Zhao
Qiao Zhao
Citations: 10
h-index: 2
Zilu Tang
Zilu Tang
Citations: 82
h-index: 5
Aaron Mueller
Aaron Mueller
Citations: 20
h-index: 3
Sebastian Schuster
Sebastian Schuster
Citations: 39
h-index: 3
Najoung Kim
Najoung Kim
Citations: 5,934
h-index: 21

Entity tracking (ET), the ability to keep track of states, is a fundamental skill that underlies complex reasoning. An increasing amount of work investigates how transformer language models (LMs) solve entity binding $\textit{without}$ state changes. However, there is limited understanding of how non-toy LMs address ET problems of realistic difficulties expressed in natural language. To this end, we investigate the mechanisms underlying ET in more complex scenarios featuring multiple state-changing operations. We find that LMs do not incrementally track world states across tokens or query-relevant states across layers, but simply aggregate relevant information in parallel at the last token when the query becomes evident. We further investigate mechanisms of individual operations ($\texttt{PUT}$, $\texttt{REMOVE}$, $\texttt{MOVE}$) to characterize this non-incremental ET mechanism. Surprisingly, LMs implement the $\texttt{REMOVE}$ operation with a fragile global suppression tag; this global removal mechanism predicts various failure modes that we confirm behaviorally. We provide a mechanistic solution of nullifying this tag to partially address this issue. Overall, our findings reveal that LMs solve a fundamentally sequential task using a non-sequential strategy. More broadly, our work illustrates how behavioral and mechanistic analyses can fruitfully interact. Behavioral results inform mechanistic hypotheses, and insights from mechanistic analyses help build stronger behavioral evaluations by predicting failure modes missing from existing evaluations.

0 Citations
0 Influential
10.5 Altmetric
52.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!