Riding Brainwaves in LLM Space: Understanding Activation Patterns Using Individual Neural Signatures

cs.CL Ajan Subramanian, Sumukh Bettadapura, Rohan Sathish · Mar 23, 2026
Local to this browser
What it does
As consumer-grade EEG headphones enter the market, a critical question emerges: can language models adapt to your specific neural signature? This paper demonstrates that frozen LLMs already contain person-specific linear directions in...
Why it matters
This paper demonstrates that frozen LLMs already contain person-specific linear directions in their activation spaces that predict individual brain activity during reading, achieving a ninefold improvement over population averages. The...
Main concern
The paper presents compelling evidence that frozen language models encode stable, person-specific neural signatures. Using word-level EEG from 30 participants reading naturalistic sentences, the authors show that individual linear probes...
Community signal
0
0 up · 0 down
Sign in to vote with arrows
AI Review AI reviewed
Plain-language introduction

As consumer-grade EEG headphones enter the market, a critical question emerges: can language models adapt to your specific neural signature? This paper demonstrates that frozen LLMs already contain person-specific linear directions in their activation spaces that predict individual brain activity during reading, achieving a ninefold improvement over population averages. The findings suggest that deep neural networks encode stable, individual cognitive fingerprints that could enable future brain-computer interfaces to personalize AI to the user wearing the headset.

Critical review
Verdict
Bottom line

The paper presents compelling evidence that frozen language models encode stable, person-specific neural signatures. Using word-level EEG from 30 participants reading naturalistic sentences, the authors show that individual linear probes substantially outperform a pooled population probe on every EEG feature tested, with high-gamma power showing a ninefold improvement ($\rho=0.183$ vs. $\rho=0.020$, $p<10^{-4}$). The characterization through cross-person transfer, temporal stability, and residualization analyses provides strong support for the claim that these directions reflect genuine individual differences rather than spurious overfitting. As stated in the abstract, "person-specific probes outperform a single population probe on every EEG feature tested; for high-gamma power, the person-specific probe achieves $\rho=0.183$, a ninefold improvement over the population probe ($\rho=0.020$, $p<10^{-4}$)."

“Person-specific probes outperform a single population probe on every EEG feature tested; for high-gamma power, the person-specific probe achieves $\rho=0.183$, a ninefold improvement over the population probe ($\rho=0.020$, $p<10^{-4}$)”
Subramanian et al., Abstract · Abstract
What holds up

The core experimental finding—that person-specific linear probes predict EEG power significantly better than population probes—is robust across multiple validations. The effect replicates in an independent LLaMA 3.1 8B model with nearly identical effect sizes ($\rho=0.196$ vs. $0.012$ for high-gamma at Layer 24), survives rigorous confound controls (word frequency, length, surprisal), and passes targeted negative controls including shuffled labels and fixation counts. The demonstration that these directions are non-transferable across individuals and temporally stable strongly suggests the probes capture genuine neural signatures. The authors note: "Self-prediction averages $\rho=0.369$, while cross-person prediction drops to $\rho=0.143$ ($p=5.8\times10^{-20}$, paired $t$-test)" and "The cosine similarity between the two weight vectors [split-half] averages $0.824$ ($p=6.6\times10^{-31}$)."

“Self-prediction averages $\rho=0.369$, while cross-person prediction drops to $\rho=0.143$ ($p=5.8\times10^{-20}$, paired $t$-test)”
Subramanian et al., Sec. 4.2 · Section 4.2
Main concerns

The absolute correlation values remain modest ($\rho \approx 0.18$), raising questions about practical utility for real-time brain-computer interfaces despite the large relative improvement over baselines. The study's sample size of 30 participants, while standard for simultaneous EEG and eye-tracking, limits generalization to broader populations. Additionally, the causal interpretation remains unestablished: the paper identifies correlational directions but does not demonstrate that manipulating these directions during generation would produce meaningful personalization, nor does it disentangle whether the signals reflect stable neuroanatomy or idiosyncratic reading behaviors. The authors acknowledge these limitations explicitly: "Our probes are linear, and nonlinear methods might capture additional structure, though they risk overfitting at the current data scale (roughly 5,000 words per participant). We do not attempt causal interventions: the person-specific directions are identified, not injected into the residual stream. The ZuCo dataset provides 30 participants, which is standard for simultaneous EEG and eye-tracking but modest for claims about the generality of individual differences."

“Our probes are linear, and nonlinear methods might capture additional structure, though they risk overfitting at the current data scale (roughly 5,000 words per participant). We do not attempt causal interventions: the person-specific directions are identified, not injected into the residual stream. The ZuCo dataset provides 30 participants, which is standard for simultaneous EEG and eye-tracking but modest for claims about the generality of individual differences.”
Subramanian et al., Sec. 6 · Section 6, Limitations
Evidence and comparison

The evidence supports the central claim that LLM activations contain person-specific neural information inaccessible to population-level analysis, with the fixation count negative control ($p=0.360$) providing a compelling baseline comparison. However, the comparison to static embeddings reveals nuance: while contextualized LLM representations show a person-specific advantage of $\Delta=0.163$, static GloVe embeddings show a smaller but significant advantage ($\Delta=0.063$), suggesting that individual differences are partially accessible even from non-contextual word vectors. The authors fairly contextualize their work against population-averaging paradigms like Brain-Score (Schrimpf et al., 2021) and representation engineering (Zou et al., 2023), accurately noting that prior work evaluates alignment at the population level or targets text-defined concepts rather than individual readers. As stated: "The GloVe control... uses static 50-dimensional word embeddings: a small but significant advantage emerges ($\Delta=0.063$, $p=4.8\times10^{-7}$), yet it is $2.6\times$ smaller than the LLM's ($\Delta=0.163$)."

“The GloVe control... uses static 50-dimensional word embeddings: a small but significant advantage emerges ($\Delta=0.063$, $p=4.8\times10^{-7}$), yet it is $2.6\times$ smaller than the LLM's ($\Delta=0.163$) and absolute accuracy is $3.2\times$ lower ($\rho=0.057$ vs. $0.183$)”
Subramanian et al., Sec. 5.2 · Section 5.2
“We identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience... We showcase how these methods can provide traction on a wide range of safety-relevant problems”
Zou et al., 2023 · Abstract
Reproducibility

The experimental pipeline is described with sufficient detail for reproduction, including specific model identifiers (Qwen/Qwen2.5-7B-Instruct, meta-llama/Llama-3.1-8B-Instruct), PCA dimensionality reduction to 50 components, Ridge regularization grid ($\alpha \in \{0.01,0.1,1,10,100,1000\}$), and 5-fold sentence-stratified cross-validation. The ZuCo 1.0 and 2.0 datasets are publicly available, and compute requirements (NVIDIA L4 GPU, 11-17 hours) and software versions (PyTorch 2.7.1, cuML 25.12) are documented. However, no code repository URL or persistent identifier is provided in the text, and the PCA projection is fit on the full vocabulary of 12,200 words, requiring exact reproduction of the tokenization pipeline. Without released preprocessing scripts or word-level activation extractions, independent reproduction of the exact feature representations may be challenging. The authors state: "PCA is fit on the full set of 12,200 unique words per layer, projecting to 50 dimensions. Ridge regression $\alpha$ is selected from $\{0.01,0.1,1,10,100,1000\}$ via held-out validation... 5-fold cross-validation is sentence-stratified."

“PCA is fit on the full set of 12,200 unique words per layer, projecting to 50 dimensions. Ridge regression $\alpha$ is selected from $\{0.01,0.1,1,10,100,1000\}$ via held-out validation on fold 1. 5-fold cross-validation is sentence-stratified: all words from a given sentence appear in the same fold.”
Subramanian et al., Appendix C · Appendix C
Abstract

Consumer-grade EEG is entering everyday devices, from earbuds to headbands, raising the question of whether language models can be adapted to individual neural responses. We test this by asking whether frozen LLM representations encode person-specific EEG signals, directions in activation space that predict one person's brain activity but not another's. Using word-level EEG from 30 participants reading naturalistic sentences (ZuCo corpus), we train a separate linear probe for each person, mapping hidden states from a frozen Qwen 2.5 7B to that individual's EEG power. Person-specific probes outperform a single population probe on every EEG feature tested; for high-gamma power, the person-specific probe achieves rho = 0.183, a ninefold improvement over the population probe (rho = 0.020, p < 10^-4). A negative control, fixation count, shows no person-specific advantage (p = 0.360); fixation count reflects word length and frequency rather than individual cognition. The individual directions are temporally stable (split-half cosine = 0.824), non-transferable across people (self rho = 0.369 vs. other rho = 0.143, p < 10^-19), and distinct from the shared population signal: person-specific probes retain predictive power after the population component is removed. The person-specific signal concentrates in the model's deep layers, rising consistently with depth and peaking at Layer 24 of 28. The results are consistent across architectures (LLaMA 3.1 8B) and survive word-level confound controls. Frozen language models contain stable, person-specific neural directions in their deep layers, providing a geometric foundation for EEG-driven personalization.

Challenge the Review

Pick a starting point or write your own. Challenges run in the background, so you can keep reading while the AI investigates.

No challenges yet. Disagree with the review? Ask the AI to revisit a specific claim.