Executive Summary
Can noise shaped to domain-discriminative directions in activation space control inference-time output distributions? This paper tests the thesis across ten experimental phases on Qwen 2.5 models from 0.5B to 7B parameters, using INLP-discovered domain directions as the noise shaping basis. The results are partially positive and partially negative, and the negative result is the more important finding.
Shaped noise achieves modest domain-specific entropy reductions (up to 6.1% for legal at 7B) and breaks 100% of repetition loops at both 3B and 7B, outperforming temperature scaling and matching repetition penalty on escape rate while achieving near-perfect token uniqueness (0.99+). These demonstrate that the mechanism works — shaped activation perturbation does influence output distributions in the predicted direction.
However, cross-domain selectivity is fundamentally limited. When targeting one domain, non-target domains respond comparably or more strongly. All correction attempts — scalar cancellation, subspace decomposition, and optimal linear correction via matrix inversion — fail. The empirical response matrix is invertible (determinant = -84.5, condition number = 21.2), the linear algebra is clean, and the optimal weight vectors are computable. They simply do not produce the predicted effects when applied.
The Terminal Measurement Problem
The root cause is identified as the terminal measurement problem: the response matrix is a terminal measurement of a process that occurs at intermediate layers. It characterizes the system's input-output mapping but cannot invert the nonlinear transformations that generate cross-domain bleed during the forward pass.
In the high-dimensional activation space (d = 3,584 at 7B), the concentration of measure guarantees that geometric orthogonality of directions is uninformative about functional overlap — nearly all pairs of vectors in high-dimensional space are nearly orthogonal regardless of their computational relationship. The bleed lives in the forward-pass topology, not in the input geometry.
This result constrains the entire class of direction-space intervention methods. Any technique that operates by projecting perturbations onto directions identified at a single layer — whether for steering, editing, or control — faces the same terminal measurement limit.
Key Findings
- Repetition loop breaking: 100% escape rate at both 3B and 7B with near-perfect token uniqueness (0.99+), outperforming temperature scaling
- Domain entropy reduction: Modest reductions (3-6% in target domains at 7B) confirm shaped activation perturbation influences output distributions
- Cross-domain bleed: Targeting medical direction reduces legal entropy by 10.7% while reducing medical by only 1.9% — selectivity is inverted
- All corrections fail: Scalar interference cancellation, subspace decomposition, and optimal linear correction via matrix inversion all produce effects uncorrelated with predictions
- Terminal measurement limit: Cross-domain response matrix characterizes the system but cannot invert nonlinear mixing at intermediate layers
- Concentration of measure: In d = 3,584 dimensions, INLP direction orthogonality is vacuous — geometric orthogonality does not imply functional independence
Methodology
The core mechanism registers a forward hook on the final four transformer layers. At each hooked layer, the hidden state is modified by adding Gaussian noise projected onto the target subspace defined by INLP domain directions. Three injection modes are tested: positive (boost target domain), negative (perturb competing domains), and combined.
Experiments span ten phases across four model scales (Qwen 2.5 at 0.5B, 1.5B, 3B, and 7B). Domain probes cover medical, legal, code, and science domains (160 prompts). The paper tests both the efficacy of shaped noise for three applications and systematically attempts to correct the discovered selectivity failure.
Key References
- Ravfogel et al. (2020) — Iterative Nullspace Projection for bias removal, used here as domain basis
- McEntire (2026) — Structural Transfer: domain-invariant structural signatures, source of INLP directions
- McEntire (2026) — Communicative Variance: sufficient conditions for stochastic resonance benefit
- Meng et al. (2022) — Activation patching, related inference-time intervention technique
- Vershynin (2018) — High-dimensional probability, concentration of measure theory