Research · Per Ardua

Causal Basis Discovery for Domain-Selective Noise Injection

Testing whether causally-derived directions outperform classification-optimized INLP

AI-14 Activation Geometry DOI

Executive Summary

The concentration barrier (Paper X) established that linear interventions face a fundamental dimensionality constraint. A natural follow-up question is whether the barrier can be circumvented by using causally-derived directions rather than classification-optimized ones. INLP finds directions that best separate domains in activation space, but these are not necessarily the directions through which domain-specific computation actually flows. Perhaps causal methods — activation patching, contrastive activations, gradient-based attribution — could discover directions more aligned with the model's actual domain processing.

This paper tests that hypothesis directly and finds it decisively refuted. INLP remains the only basis with positive selectivity (+0.618), while all three causal bases produce anti-selective effects. The pairwise overlaps between INLP and causal bases are near-zero (0.011-0.016), confirming that classification-relevant and causally-relevant directions occupy almost entirely disjoint subspaces. This establishes a fundamental classification-intervention dissociation: the directions that best identify domain are not the directions that best influence domain-specific processing.

This result has significant implications for the broader activation geometry program. It means the concentration barrier cannot be escaped by simply finding "better" directions — the problem is structural, not methodological. The geometry of domain classification and the geometry of domain computation are fundamentally different, and no linear basis discovery method bridges this gap.

Key Findings

  • INLP is uniquely selective: INLP is the only basis with positive selectivity (+0.618); all causal bases (patching, contrastive, gradient) produce anti-selective effects
  • Near-zero subspace overlap: Pairwise overlaps between INLP and causal bases range from 0.011 to 0.016, confirming they occupy almost entirely disjoint subspaces
  • Classification-intervention dissociation: Directions that best classify domain membership are not the directions through which domain-specific computation flows
  • Causal bases are anti-selective: Injecting noise along causally-derived directions disrupts all domains roughly equally or preferentially disrupts non-target domains

Key References

  • McEntire (2026) — The Concentration Barrier: dimensionality bounds on selectivity (Paper X)
  • McEntire (2026) — Shaped Noise Injection: terminal measurement limit (Paper VIII)
  • McEntire (2026) — Layer-Resolved Response Tensor: selectivity profile (Paper IX)
  • Ravfogel et al. (2020) — Iterative Nullspace Projection
  • Vig et al. (2020) — Causal mediation analysis in neural networks

Download Full Paper

Access the complete research paper with detailed methodology, empirical evidence, and formal proofs.

Download PDF