Causal Basis Discovery for Domain-Selective Noise Injection

Executive Summary

The concentration barrier (Paper X) established that linear interventions face a fundamental dimensionality constraint. A natural follow-up question is whether the barrier can be circumvented by using causally-derived directions rather than classification-optimized ones. INLP finds directions that best separate domains in activation space, but these are not necessarily the directions through which domain-specific computation actually flows. Perhaps causal methods — activation patching, contrastive activations, gradient-based attribution — could discover directions more aligned with the model's actual domain processing.

This paper tests that hypothesis directly and finds it decisively refuted. INLP remains the only basis with positive selectivity (+0.618), while all three causal bases produce anti-selective effects. The pairwise overlaps between INLP and causal bases are near-zero (0.011-0.016), confirming that classification-relevant and causally-relevant directions occupy almost entirely disjoint subspaces. This establishes a fundamental classification-intervention dissociation: the directions that best identify domain are not the directions that best influence domain-specific processing.

This result has significant implications for the broader activation geometry program. It means the concentration barrier cannot be escaped by simply finding "better" directions — the problem is structural, not methodological. The geometry of domain classification and the geometry of domain computation are fundamentally different, and no linear basis discovery method bridges this gap.

Key Findings

INLP is uniquely selective: INLP is the only basis with positive selectivity (+0.618); all causal bases (patching, contrastive, gradient) produce anti-selective effects
Near-zero subspace overlap: Pairwise overlaps between INLP and causal bases range from 0.011 to 0.016, confirming they occupy almost entirely disjoint subspaces
Classification-intervention dissociation: Directions that best classify domain membership are not the directions through which domain-specific computation flows
Causal bases are anti-selective: Injecting noise along causally-derived directions disrupts all domains roughly equally or preferentially disrupts non-target domains

Key References

McEntire (2026) — The Concentration Barrier: dimensionality bounds on selectivity (Paper X)
McEntire (2026) — Shaped Noise Injection: terminal measurement limit (Paper VIII)
McEntire (2026) — Layer-Resolved Response Tensor: selectivity profile (Paper IX)
Ravfogel et al. (2020) — Iterative Nullspace Projection
Vig et al. (2020) — Causal mediation analysis in neural networks

Executive Summary

Key Findings

Key References

Download Full Paper