Structural Entanglement in the Informative Subspace: Eight Experiments on Why Every Direction Carries Every Concept

This paper has been superseded

This paper has been consolidated into Universal Entanglement in Transformer Activation Space (DOI: 10.5281/zenodo.19409951). The content below is retained for reference.

Executive Summary

This paper establishes structural entanglement as a geometric phenomenon through eight experiments across four transformer architectures (GPT-2 124M, Qwen-0.5B, Qwen-7B, and Qwen-7B-Instruct). The key result: random Gaussian projections to d greater than or equal to 448 reproduce the learned entanglement intensity (EI = 1.50), while projections to the 7-dimensional informative rank show baseline EI (0.18). PCA reverses the effect. Entanglement is not learned during training — it is a consequence of encoding concepts in high-dimensional space. Stratified bootstrap confidence intervals (2,000 iterations) confirm EI is significantly above zero for all four models.

Superlinear amplification is confirmed: when three concepts are probed simultaneously, the triple EI exceeds the mean pairwise EI by 2x (GPT-2: 1.87x, Qwen-7B: 2.15x). This is not a measurement artifact — nesting two concepts into one reduces EI below the pairwise baseline, confirming that independent concept axes drive the superlinearity. Concept-type independence is validated by replacing linguistic concepts with software engineering concepts, producing comparable EI (mean SE/Original ratio = 0.97).

RLHF accelerates entanglement crystallization during training. The instruction-tuned Qwen-7B variant reaches terminal EI faster than the base model. Together, these experiments establish that entanglement intensity is determined by the ratio d/k (hidden dimension to number of concepts), not by what is learned or how it is trained.

Key Findings

Random projection reproduces entanglement: Gaussian projections to 448d match learned EI (1.50); projections to 7d yield baseline (0.18)
PCA reverses entanglement: PCA to 112 dimensions achieves EI 0.18 with purity 0.76 vs random projection EI 0.45 with purity 0.60
Superlinear amplification: Triple/pairwise EI ratio 1.87x (GPT-2) to 2.15x (Qwen-7B)
Concept-type independence: SE concepts produce comparable EI (mean ratio 0.97)
RLHF accelerates crystallization: Instruction tuning reaches terminal EI faster
Cross-model consistency: All four architectures show EI greater than 1.0 at terminal layers (1.39-1.53), with bootstrap 95% CIs confirming significance

Key References

McEntire (2026) — Entangled Directions (AI-25): discovers the discrimination-activation dissociation
McEntire (2026) — The Entanglement Theorem (AI-27): formal proof of the geometric mechanism
McEntire (2026) — The Concentration Barrier (AI-11): effective dimensionality bounds
Johnson and Lindenstrauss (1984) — Extensions of Lipschitz mappings into Hilbert space

Executive Summary

Key Findings

Key References

Download Full Paper