2606.12113v1 Jun 10, 2026 cs.CL

Augmenting Molecular Language Models with Local $n$-gram Memory

Zijing Liu
Zijing Liu
Citations: 321
h-index: 7
He Cao
He Cao
Citations: 58
h-index: 3
Yu Li
Yu Li
Citations: 222
h-index: 7
Xinni Zhang
Xinni Zhang
Citations: 149
h-index: 5
Irwin King
Irwin King
Citations: 180
h-index: 4

Transformer-based language models for SMILES strings suffer from a locality gap: standard character-level tokenization fragments chemically meaningful motifs, forcing models to repeatedly learn local syntax at the expense of long-range dependencies. To address this without disrupting standard tokenizers, we propose MolGram, which integrates a conditional $n$-gram memory module into molecular language models. MolGram maps local string patterns to learned embeddings via scalable hash lookups and dynamically injects this regional context into hidden states. Evaluations across three tasks, including unconditional molecule generation, forward reaction prediction, and single-step retrosynthesis, show that MolGram consistently improves performance. Crucially, our analyses demonstrate that MolGram outperforms baselines with 3$\times$ more parameters, establishing explicit local pattern memory as a highly efficient inductive bias.

0 Citations
0 Influential
3.5 Altmetric
17.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!