Provisional Patent Filed · Hypernym Inc.

MODULUM

Seven modular inference-time components that exploit the universal geometric structure of transformer attention heads. No model weights modified. No training. No fine-tuning.

3.04×+
Decode Speedup
SCALES WITH CONTEXT
47%
Domain PPL
14.18%
Below F16 Baseline
17
Patent Claims

Measured Results

8B beats 228B
An 8 billion parameter model with 10K tokens of domain exposure outperforms a 228 billion parameter model cold — by 32.4%.
Llama 3.1 8B annealed PPL 3.86 vs MiniMax M2.5 228B cold PPL 5.71. Same domain text. Neither in any training set.
Better than F16
Perplexity drops below the full-precision baseline. The model computes on cleaner data than F16 provides.
38 measurements, 3 corpora, 7 context lengths. 38 improvements. Zero regressions. Zero speed cost.
75% of keys are noise
Three-quarters of KV cache entries contribute nothing to attention output. This fraction is a geometric constant — 3π/(π + 3π) — invariant across architectures.
Measured: Llama 3.1 8B (24/32 heads), MiniMax M2.5 228B (36/48 heads). Both exactly 75.0%.
4 companies, 1 algebra
Models from Meta, OpenAI-adjacent, Alibaba, and MiniMax independently converge to the octonionic multiplication table in their attention heads.
Fano plane discovered in KV head Hadamard products. p = 0.0013. The last normed division algebra (Hurwitz, 1898).

The Fano Plane Discovery

Every tested transformer with 8 KV heads organizes its head interactions into the multiplication table of the imaginary octonions — the unique finite projective plane of order 2. This structure is not designed into the models. It emerges from training because it is algebraically optimal.

7 points. 7 lines. 3 points per line.

Each point is a KV attention head. Each line represents a coherent triple — three heads whose Hadamard product coherence exceeds the significance threshold. The arrangement IS the octonion multiplication table.

Llama 3.1 8B: 1 valid Fano plane, coherence 1.098
GPT-OSS 20B: 10 valid planes, coherence 4.329
Qwen3 32B: 30 valid planes, coherence 6.884
MiniMax M2.5 228B: Valid per-expert, confirmed

10,000 random trials: mean coherence 0.230 ± 0.366. Discovered Fano plane at 1.098 is 3.23σ above null.

Trifurcated Head Classification

Attention heads self-organize into three classes by their entropy growth rate across sequence positions. The growth rates converge to transcendental constants — properties of the geometry, not any specific model. Four models, four companies, agreement within 0.52%.

Sharp — slope → 1/φ ≈ 0.618 — θ = 0 — NEVER gate
Diffuse — slope → 3/(eφ) ≈ 0.682 — θ = 0.20
Bulk — slope → π/(eφ) ≈ 0.714 — θ = 0.15

The Seven Armor Pieces

Pauldrons

Speed

Geometric boundary z-buffer sort. Compute Q_BOS from model weights at load time. Sort KV cache so low-signal keys are contiguous. Flash Attention's block-skip handles the rest. The kernel never sees 75% of keys.

3.04×
decode at 128K, 0% PPL change

Breastplate

Quality

Spectrally gated attention function W(s,θ) producing true arithmetic zeros in attention weights. Per-head θ via trifurcated classification. The model can say no.

+27pp
BABILong at 16K (228B MoE)

Skirt

Denoise

Breastplate-classified adaptive KV cache denoising. Low-signal positions identified by real-time attention weights inside the softmax kernel at zero cost. TBQ round-trip removes noise F16 preserves.

−14.18%
PPL below F16 at 16K

Boots

Eviction

Geometric cache eviction via polynomial eigenvector alignment. Query-independent — eviction decisions don't change based on what's being asked. Stable under sustained context pressure.

effective context, fixed memory

Cape

Recovery

Cold tier codepage. Evicted keys are caught, RoPE-stripped, projected to intrinsic dimensionality, compressed. Recalled on attention deficit. Eviction becomes recoverable.

7.2×
compression, 0.04% round-trip

Bracers

Evaluation

Transfer matrix boundary evaluation. A document's effect on the cache predicted from its boundary tokens alone. 14% of tokens → 75% of benefit. Matrices compose multiplicatively.

96 KB
vs 1.3 GB full (4 orders of magnitude)

Helm

Focus

Two-phase active vocabulary. READ from prefill + WRITE from generation = 500–2,000 domain-relevant tokens. 99% of vocabulary masked. Eliminates out-of-domain hallucination.

99%
vocabulary noise eliminated

The Annealing Discovery

−47%
Perplexity improvement on domain text.
Model-independent. Parameter-independent.
A property of the geometry.

Feed domain text through the model. Compact the KV cache via geometric eviction. Repeat. The cache accumulates domain knowledge — without modifying model weights.

Freeze the result to disk as a Lodestone. Restore it on a fresh process. The model inherits domain expertise instantly. A $200 phone with a lodestone competes with a datacenter.

ModelParametersCold PPLAnnealed PPLImprovement
Llama 3.1 8B8B7.403.86−47.8%
MiniMax M2.5228B5.712.56−55.2%

17 Patent Claims Filed

1PauldronsGeometric boundary z-buffer sort + block-skip for lossless decode speedup
2PauldronsQ_BOS computation from model weights only
3PauldronsSpeedup proportional to 1/(1 − null_cone_fraction), exceeds 3× at 128K
4Breastplatespectral gate with trifurcated per-head θ producing true attention zeros
5BreastplateTranscendental entropy slope classification (1/φ, 3/eφ, π/eφ)
6Breastplateuint64 bitmask per-head resolution in attention kernel
7DiscoveryFano plane / octonionic structure in attention heads
8BootsGeometric KV eviction via polynomial eigenvector alignment
9AnnealingTemporal accumulation producing ~47% domain PPL improvement
10AnnealingSaturation scaling with intrinsic dimensionality
11LodestonePersistent domain expertise via frozen KV state
12CapeCold tier codepage with RoPE stripping + recall
13BracersTransfer matrix boundary evaluation with multiplicative composition
14HelmTwo-phase active vocabulary learning + restriction
15PauldronsAttention-weighted geometric boundary for 1-bit / arch-agnostic models
16CapePCA + 4-bit residual Lodestone compression (7×)
17SkirtAdaptive KV denoising, −14.18% PPL, 38/38 zero regressions
⚠️ INTERCEPTED TRANSMISSION ⚠️
// TODO: REMOVE BEFORE DEPLOY — someone left the internal reception analysis in the build AGAIN. this is the THIRD TIME. — CF
Internal Reception Analysis
Three independent evaluators • Simulated pitch • Internal thought process captured
⚡ CONFIDENTIAL — DO NOT DISTRIBUTE ⚡
last modified: yesterday at 3am by someone who should have known better

visitor counter: 000003 | best viewed in netscape navigator 4.0 | under construction 🚧