Research · Per Ardua

Constellation-Indexed Model Composition: Query-Driven Parameter Mixing via Activation Fingerprints

Dynamic specialist model mixing via orthogonal activation decomposition

AI-3 Activation Geometry DOI

Executive Summary

Model composition — combining specialist models at inference time based on query requirements — has been limited by alignment failures that occur when specialist models trained in different domains produce incompatible activation geometries. This paper introduces constellation-indexed composition, a framework that resolves alignment failures by indexing specialists through a shared generalist activation space rather than attempting direct specialist-to-specialist alignment.

The key innovation is a two-stage process. First, Gram-Schmidt orthogonalization is applied to domain-specific activation centroids, reducing inter-domain cosine similarity from 0.91-0.97 (near-parallel, high interference) to approximately zero (orthogonal, no interference). Second, query-driven parameter mixing selects and weights specialists based on the query's projection onto these orthogonalized domain axes. This eliminates the catastrophic interference that plagued prior composition methods.

The framework achieves a 98.1% win rate over baselines, up from 20.6% without generalist-space indexing. At 7B scale, a stochastic resonance mechanism at sigma=0.020 rescues compositions that would otherwise fail, contributing an additional 90.7 points to aggregate performance. The architecture is invariant across GPT-2 and Qwen model families, demonstrating that the composition principle operates at the level of activation geometry rather than model-specific structure.

Key Contributions and Methodology

The paper addresses the fundamental problem in model composition: specialist models trained independently on different domains develop correlated activation geometries (cosine similarity 0.91-0.97 between domain centroids), making it impossible to separate their contributions during mixing. Prior work attempted to resolve this through careful fine-tuning schedules or post-hoc alignment, both of which scale poorly.

The generalist-space indexing approach sidesteps this problem entirely. A generalist model — the common ancestor from which all specialists were fine-tuned — provides a shared coordinate system. Specialist activations are projected into this space, and Gram-Schmidt orthogonalization is applied to the resulting domain centroids. This produces an orthogonal basis where each domain occupies a distinct subspace, enabling clean decomposition of any query's domain requirements.

The stochastic resonance finding is unexpected. At 7B scale, adding small amounts of Gaussian noise (sigma=0.020) to the composition weights improves performance by rescuing near-boundary queries that deterministic mixing assigns incorrectly. This connects to the broader stochastic resonance literature, where noise injection improves signal detection in nonlinear systems operating near a threshold.

Key Findings

  • Alignment failure resolution: Generalist-space indexing raises composition win rate from 20.6% to 98.1%, eliminating the dominant failure mode in model composition
  • Orthogonalization effectiveness: Gram-Schmidt reduces domain centroid cosine similarity from 0.91-0.97 to approximately zero, creating interference-free composition subspaces
  • Stochastic resonance rescue: Noise injection at sigma=0.020 rescues composition at 7B scale, contributing +90.7 points to aggregate performance
  • Architecture invariance: The composition framework generalizes across GPT-2 and Qwen model families without architecture-specific modifications
  • Scale behavior: Composition quality improves with model scale as activation geometries become more structured and separable

Key References

Ilharco, G., et al. (2023)

Editing Models with Task Arithmetic. International Conference on Learning Representations.

Yadav, P., et al. (2023)

TIES-Merging: Resolving Interference When Merging Models. Advances in Neural Information Processing Systems.

Wortsman, M., et al. (2022)

Model Soups: Averaging Weights of Multiple Fine-Tuned Models Improves Accuracy Without Increasing Inference Time. International Conference on Machine Learning.

Gammaitoni, L., et al. (1998)

Stochastic Resonance. Reviews of Modern Physics, 70(1), 223-287.

Matena, M. S., & Raffel, C. (2022)

Merging Models with Fisher-Weighted Averaging. Advances in Neural Information Processing Systems.

Download Full Paper

Access the complete research paper with detailed methodology, empirical evidence, and formal proofs.

Download PDF