A Comparative Analysis of LLM Memorization at Statistical and Internal Levels: Cross-Model Commonalities and Model-Specific Signatures
This paper presents a large-scale comparative study of memorization across six open LLM families (Pythia, OLMo1/2/3, OpenLLaMA, StarCoder) ranging from 1B to 32B parameters. By analyzing both statistical patterns and internal mechanisms (attention heads, layer decoding), it identifies universal behaviors—such as log-linear scaling of memorization rates with model size and high compressibility of memorized sequences—while revealing family-specific signatures in memorization structure. The work bridges isolated findings from single-model studies to establish general principles of how transformers memorize training data.
The paper provides a valuable cross-model analysis of LLM memorization, successfully identifying both universal scaling laws and family-specific architectural signatures. The dual approach combining statistical analysis (compression ratios, frequency thresholds) with mechanistic interpretability (attention ablation, logit lens) offers a comprehensive view of memorization phenomena. The finding that 'memorization-important heads highly overlap within domains' while 'the distribution of those important heads differs between families' represents a meaningful contribution to understanding model-specific inductive biases.
The scale of the study is impressive, covering 20 models across six families with varying architectures and training corpora. The identification of log-linear scaling between model size and memorization rate and the high compressibility of memorized sequences (often requiring $\leq 50\%$ of original tokens) are robust findings supported by extensive data. The internal analysis revealing that 'memorized sequences exhibit lower similarity recovery compared to unmemorized sequences' provides concrete evidence for the hypothesis that memorization relies on specific computational pathways.
The study is necessarily limited to models with publicly available training data, excluding popular families like LLaMA, GPT, and DeepSeek, which limits the generality of claims about universal LLM behavior. The sampling methodology (300k sequences per domain) may introduce selection bias, and the head ablation study uses reduced sample sizes for larger models ($2{,}500$ vs $10{,}000$ examples) due to computational constraints, potentially affecting statistical power. Additionally, the frequency analysis relies on Infini-gram's Llama-2 tokenizer which mismatches the target models' tokenizers, introducing approximation errors that the authors acknowledge but do not fully quantify.
The paper positions itself against prior single-model studies and successfully demonstrates that findings like log-linear scaling hold across diverse architectures. However, comparisons to related work on mechanistic memorization are somewhat superficial—the paper acknowledges studies on knowledge neurons and intrinsic dimension but does not deeply engage with how their specific findings relate to the observed head importance distributions. The claim that 'the memorization structure is decided by the training recipe of each model family' is well-supported by the layer importance similarity heatmap, but the causal attribution to specific training components remains speculative.
The study relies exclusively on open-weight models with documented training data, which aids reproducibility. However, the authors note that 'to generate all memorization scores across 6 noise strengths for all models... it takes around 2 months for 8 A100 servers', creating a high barrier for independent verification. While the metrics are clearly defined—including the memorization score $M_i(X,Y)=\frac{\sum_{k=1}^{n}\mathbf{I}(x_{i,k}=y_{i,k})}{n}$ and residual noise injection $\tilde{\mathbf{H}}_{\ell}=\mathbf{H}_{\ell}+\boldsymbol{\varepsilon}$ with $\sigma_{\mathrm{eff}}=\alpha\cdot\operatorname{RMS}(\mathbf{H}_{\ell})$—the authors do not indicate whether code will be released. The limitation that 'Infini-gram does not provide a query API for OLMo3 models' also means frequency analyses cannot be fully replicated for the most recent models studied.
Memorization is a fundamental component of intelligence for both humans and LLMs. However, while LLM performance scales rapidly, our understanding of memorization lags. Due to limited access to the pre-training data of LLMs, most previous studies focus on a single model series, leading to isolated observations among series, making it unclear which findings are general or specific. In this study, we collect multiple model series (Pythia, OpenLLaMa, StarCoder, OLMo1/2/3) and analyze their shared or unique memorization behavior at both the statistical and internal levels, connecting individual observations while showing new findings. At the statistical level, we reveal that the memorization rate scales log-linearly with model size, and memorized sequences can be further compressed. Further analysis demonstrated a shared frequency and domain distribution pattern for memorized sequences. However, different models also show individual features under the above observations. At the internal level, we find that LLMs can remove certain injected perturbations, while memorized sequences are more sensitive. By decoding middle layers and attention head ablation, we revealed the general decoding process and shared important heads for memorization. However, the distribution of those important heads differs between families, showing a unique family-level feature. Through bridging various experiments and revealing new findings, this study paves the way for a universal and fundamental understanding of memorization in LLM.
Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.
No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.