Neural Networks & Identity

Attention mechanisms, identity persistence, thermal dynamics, and associative emergence in transformer architectures.

4 papers / 13.5k words

Identity Is Not Computed: KV Cache as the Locus of LLM Identity

The question of whether large language models possess something analogous to identity, and if so, where that identity resides, has remained largely unexamined in the machine learning literature. We present evidence that identity in transformer-based language models is not computed during inference but rather stored in key-value (KV) cache activation patterns. Through a series of experiments on 70-billion parameter models, we demonstrate a 0.9584 pattern correlation between pre-training and post-training activation states, with identity-relevant information concentrated in layers 60-79.

Joshua KornreichKeywords:Catastrophic forgettingComputational expenseNon-locality

Thermal Dynamics in Neural Substrates: Memory Without External Storage

We present empirical evidence that spatial thermal zones in neural substrates produce differential plasticity gradients capable of protecting consolidated memories from catastrophic interference without external storage mechanisms. Through a series of 27 controlled experiments (EXP68-94), we demonstrate that regions maintained at lower computational temperatures exhibit near-zero weight drift (0.92% plasticity) while adjacent high-temperature zones retain full learning capacity (99.08% plasticity). Most remarkably, connections that traverse from HOT to COOL zones show exactly zero drift after 10,000 ticks of interference—a finding that suggests a physical mechanism for memory consolidation analogous to biological synaptic tagging.

Joshua KornreichKeywords:Empirical demonstrationEvidence of zero-driftA theoretical framework

Socratic Finetuning: Signal-Guided Learning Without Labels

We present Socratic Finetuning, a novel training paradigm for large language models that inverts the traditional information flow of supervised learning. Rather than providing input-output pairs with explicit labels, we extract real-time signals from model activations during interactive dialogue and use these signals to weight gradient updates. We identify six distinct activation zones in transformer architectures that correspond to neurotransmitter-analog functions (Glutamate, GABA, Acetylcholine, Norepinephrine, Dopamine, Serotonin) and demonstrate that these zones exhibit predictable behaviors that can guide learning.

Joshua KornreichStandard Training:Socratic Training:Turn 1-k:Update:

Associative Emergence in Sparse Neural Topologies

We investigate the conditions under which sparse neural topologies exhibit associative emergence—the spontaneous formation of relationships not present in training data. Through a series of controlled experiments on GPU-resident neural substrates, we demonstrate that networks trained only on adjacent associations (A→B and B→C) spontaneously develop transitive associations (A→C) at discrimination ratios of 3.3x relative to control patterns. This emergence requires structured sparsity; random sparse connectivity fails entirely, producing 1:1 discrimination ratios.

Joshua KornreichKeywords:Random sparsityStructured sparsityPhysical thermodynamics