Executive Summary
When a language model solves a problem, its internal activations encode both what the problem is about (domain) and how the problem is structured (shape). This paper demonstrates that these two aspects occupy linearly separable subspaces in activation space — a clean double dissociation that enables a novel form of transfer learning. Domain content can be erased while preserving structural signatures, and structural templates can be transported across domains without domain-specific retraining.
The separability is established through Iterative Null-space Projection (INLP), which erases domain signal to near-chance classification accuracy while shape classification holds at 95.6% or higher across Qwen 2.5 models from 0.5B to 7B parameters. Three transfer tests validate the practical utility of this separation: cross-domain shape classification, nearest-prototype matching, and a full strip-and-rehydrate pipeline that removes domain content from a solution and reconstitutes it in a new domain.
A further contribution is the discovery that subspace-targeted stochastic resonance can find domain signal that standard INLP cannot reach. This indicates that the domain-structure separation, while predominantly linear, has a nonlinear residual component that noise injection can exploit. The finding refines the separability claim: domain and structure are linearly separable to a high degree, with a small nonlinear entanglement that requires specialized tools to detect.
Key Contributions and Methodology
The experimental design constructs matched problem sets across multiple domains, ensuring that identical structural templates (e.g., causal chains, comparison structures, conditional reasoning) appear in diverse domain contexts (e.g., physics, economics, biology, law). Activation fingerprints are extracted from models processing these problems, providing a matrix of activations indexed by both domain and structure.
INLP is applied to erase domain information from the activation space. This involves iteratively training linear classifiers on domain labels, projecting activations onto the null space of each classifier, and repeating until domain classification falls to chance. The critical test is whether shape classification accuracy survives this erasure. Across all model scales tested, shape accuracy remains at or above 95.6% after full domain erasure — establishing the double dissociation.
The strip-and-rehydrate pipeline operationalizes this separation for practical transfer. Given a solved problem in domain A, the pipeline extracts the structural signature (stripping domain content via null-space projection), identifies the analogous structural template in domain B (via nearest-prototype matching), and reconstitutes a domain-B solution using the transported structural template. This enables transfer without retraining on domain-B examples of the target structure.
Key Findings
- Double dissociation: INLP erases domain signal to near-chance while shape classification holds at >=95.6% across Qwen 2.5 0.5B-7B, confirming linear separability
- Three transfer mechanisms: Cross-domain shape classification, nearest-prototype matching, and strip-and-rehydrate all validate the practical utility of structural signatures
- Nonlinear residual: Subspace-targeted stochastic resonance finds domain signal that standard INLP cannot reach, revealing small nonlinear entanglement between domain and structure
- Scale invariance: The separability holds from 0.5B to 7B parameters, suggesting it is a fundamental property of transformer representations rather than an artifact of scale
- Transfer without retraining: Structural templates transported across domains produce valid solutions without any domain-specific fine-tuning
Key References
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection. Annual Meeting of the Association for Computational Linguistics.
Linguistic Regularities in Continuous Space Word Representations. Conference of the North American Chapter of the ACL.
Toy Models of Superposition. Anthropic Research.
Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Structure-Mapping: A Theoretical Framework for Analogy. Cognitive Science, 7(2), 155-170.