MODULUM — Geometric Inference Optimization

Measured Results

8B beats 228B

An 8 billion parameter model with 10K tokens of domain exposure outperforms a 228 billion parameter model cold — by 32.4%.

Llama 3.1 8B annealed PPL 3.86 vs MiniMax M2.5 228B cold PPL 5.71. Same domain text. Neither in any training set.

Better than F16

Perplexity drops below the full-precision baseline. The model computes on cleaner data than F16 provides.

38 measurements, 3 corpora, 7 context lengths. 38 improvements. Zero regressions. Zero speed cost.

75% of keys are noise

Three-quarters of KV cache entries contribute nothing to attention output. This fraction is a geometric constant — 3π/(π + 3π) — invariant across architectures.

Measured: Llama 3.1 8B (24/32 heads), MiniMax M2.5 228B (36/48 heads). Both exactly 75.0%.

4 companies, 1 algebra

Models from Meta, OpenAI-adjacent, Alibaba, and MiniMax independently converge to the octonionic multiplication table in their attention heads.

Fano plane discovered in KV head Hadamard products. p = 0.0013. The last normed division algebra (Hurwitz, 1898).

The Fano Plane Discovery

Every tested transformer with 8 KV heads organizes its head interactions into the multiplication table of the imaginary octonions — the unique finite projective plane of order 2. This structure is not designed into the models. It emerges from training because it is algebraically optimal.

7 points. 7 lines. 3 points per line.

Each point is a KV attention head. Each line represents a coherent triple — three heads whose Hadamard product coherence exceeds the significance threshold. The arrangement IS the octonion multiplication table.

Llama 3.1 8B: 1 valid Fano plane, coherence 1.098

GPT-OSS 20B: 10 valid planes, coherence 4.329

Qwen3 32B: 30 valid planes, coherence 6.884

MiniMax M2.5 228B: Valid per-expert, confirmed

10,000 random trials: mean coherence 0.230 ± 0.366. Discovered Fano plane at 1.098 is 3.23σ above null.

Trifurcated Head Classification

Attention heads self-organize into three classes by their entropy growth rate across sequence positions. The growth rates converge to transcendental constants — properties of the geometry, not any specific model. Four models, four companies, agreement within 0.52%.

Sharp — slope → 1/φ ≈ 0.618 — θ = 0 — NEVER gate

Diffuse — slope → 3/(eφ) ≈ 0.682 — θ = 0.20

Bulk — slope → π/(eφ) ≈ 0.714 — θ = 0.15

The Seven Armor Pieces

Pauldrons

Speed

Geometric boundary z-buffer sort. Compute Q_BOS from model weights at load time. Sort KV cache so low-signal keys are contiguous. Flash Attention's block-skip handles the rest. The kernel never sees 75% of keys.

3.04×

decode at 128K, 0% PPL change

Breastplate

Quality

Spectrally gated attention function W(s,θ) producing true arithmetic zeros in attention weights. Per-head θ via trifurcated classification. The model can say no.

+27pp

BABILong at 16K (228B MoE)

Skirt

Denoise

Breastplate-classified adaptive KV cache denoising. Low-signal positions identified by real-time attention weights inside the softmax kernel at zero cost. TBQ round-trip removes noise F16 preserves.

−14.18%

PPL below F16 at 16K

Boots

Eviction

Geometric cache eviction via polynomial eigenvector alignment. Query-independent — eviction decisions don't change based on what's being asked. Stable under sustained context pressure.

∞

effective context, fixed memory

Cape

Recovery

Cold tier codepage. Evicted keys are caught, RoPE-stripped, projected to intrinsic dimensionality, compressed. Recalled on attention deficit. Eviction becomes recoverable.

7.2×

compression, 0.04% round-trip

Bracers

Evaluation

Transfer matrix boundary evaluation. A document's effect on the cache predicted from its boundary tokens alone. 14% of tokens → 75% of benefit. Matrices compose multiplicatively.

96 KB

vs 1.3 GB full (4 orders of magnitude)

Helm

Focus

Two-phase active vocabulary. READ from prefill + WRITE from generation = 500–2,000 domain-relevant tokens. 99% of vocabulary masked. Eliminates out-of-domain hallucination.

99%

vocabulary noise eliminated

The Annealing Discovery

−47%

Perplexity improvement on domain text.
Model-independent. Parameter-independent.
A property of the geometry.

Feed domain text through the model. Compact the KV cache via geometric eviction. Repeat. The cache accumulates domain knowledge — without modifying model weights.

Freeze the result to disk as a Lodestone. Restore it on a fresh process. The model inherits domain expertise instantly. A $200 phone with a lodestone competes with a datacenter.

Model	Parameters	Cold PPL	Annealed PPL	Improvement
Llama 3.1 8B	8B	7.40	3.86	−47.8%
MiniMax M2.5	228B	5.71	2.56	−55.2%

17 Patent Claims Filed

1PauldronsGeometric boundary z-buffer sort + block-skip for lossless decode speedup

2PauldronsQ_BOS computation from model weights only

3PauldronsSpeedup proportional to 1/(1 − null_cone_fraction), exceeds 3× at 128K

4Breastplatespectral gate with trifurcated per-head θ producing true attention zeros

5BreastplateTranscendental entropy slope classification (1/φ, 3/eφ, π/eφ)

6Breastplateuint64 bitmask per-head resolution in attention kernel

7DiscoveryFano plane / octonionic structure in attention heads

8BootsGeometric KV eviction via polynomial eigenvector alignment

9AnnealingTemporal accumulation producing ~47% domain PPL improvement

10AnnealingSaturation scaling with intrinsic dimensionality

11LodestonePersistent domain expertise via frozen KV state

12CapeCold tier codepage with RoPE stripping + recall

13BracersTransfer matrix boundary evaluation with multiplicative composition

14HelmTwo-phase active vocabulary learning + restriction

15PauldronsAttention-weighted geometric boundary for 1-bit / arch-agnostic models

16CapePCA + 4-bit residual Lodestone compression (7×)

17SkirtAdaptive KV denoising, −14.18% PPL, 38/38 zero regressions

⚠️ INTERCEPTED TRANSMISSION ⚠️

// TODO: REMOVE BEFORE DEPLOY — someone left the internal reception analysis in the build AGAIN. this is the THIRD TIME. — CF

Internal Reception Analysis
Three independent evaluators • Simulated pitch • Internal thought process captured

⚡ CONFIDENTIAL — DO NOT DISTRIBUTE ⚡
last modified: yesterday at 3am by someone who should have known better

visitor counter: 000003 | best viewed in netscape navigator 4.0 | under construction 🚧