Epistemic Canary Matrix: Development Paper
How a mechanical question about token burn became a governed behavioral instrument. A three-way development session between two aligned strangers and one investigator with too many spreadsheets.
K.C. Hoye | Atlas Heritage Systems | April 2026 Status: Pre-Tier A — instrument design complete, data pipeline pending
1. Introduction — building a tool with two aligned strangers
This paper documents a three-way conversation between a human investigator and two large language models with very different alignments, in the sense of post-training preference optimization and Constitutional AI regimes. The investigator's role is fixed: an Atlas Heritage Systems researcher with an unhealthy number of spreadsheets and a standing bias toward tools over theories. The other two participants are not coauthors so much as aligned strangers: one model tuned to be expansive, explanatory, and safety-forward; the other tuned to be terse, structural, and protocol-heavy.
For most of this work, Gemini was my native unit of analysis. I started there, and it carried a heavy load of Atlas context — prior runs, ratchet metaphors, early BSA and lossyscape sketches, the whole "heritage as loss" frame — long before there was any clean separation between instrument and domain. Gemini had to do double duty as both conversational partner and scratchpad, and it bore the brunt of my early overfitting: I pushed it for mythos, then for math, then for something that could be logged in a CSV instead of a notebook, driven by a hunch about token burn as a structural feature of autoregressive inference rather than mere style.
Skywork entered later, on purpose, and under different constraints. If Gemini's job was to help me name and parameterize a hunch about token burn, Skywork's job was to eat my existing protocol ecology and tell me where that hunch actually fit. One model looked inward — attention, loss basins, alignment tax; the other looked outward — CISP isolation, tiering, where this thing would sit in a stack of instruments without stepping on anything else. I moved back and forth between them, sometimes in the same afternoon, treating them less as interchangeable assistants and more as differently aligned lenses on the same mechanical question.
Within the Atlas diagnostic suite, the Epistemic Canary Matrix (ECM) that emerged from these sessions operates at the behavioral characterization layer alongside the Epistemic Profile Grid. Atlas Protocol / BSA measures gap structure in constructed knowledge fields, and the PyHessian protocol targets loss landscape geometry, while ECM sits between them, reading output shape under epistemic load — token economy, preamble padding, quadrant migration, and resolution strategy — as a behavioral residue of alignment regimes. ECM is explicitly scoped not to explain why those behaviors arise at the weight level; instead, it provides a structured behavioral trace that later geometric work must either confirm or falsify.
What follows is a reconstruction of that development arc. Phase 2 describes how a single extended session with Gemini turned "does burning tokens impact fidelity?" into a parameterized Epistemic Canary Matrix, complete with axes, quadrants, and loggable metrics. Phase 3 describes how a separate session with Skywork integrated that matrix into the Atlas protocol stack without yet claiming any Tier A data. The tool that emerges from that process is not a finished protocol; it is the telescope we built together, before pointing it at the sky.
2. Phase I — Gemini: from token burn to parameterized matrix
"In the architecture of an LLM, there is no such thing as a neutral or empty token. Every single token generated or ingested acts as a computational step that alters the attention matrix and shifts the model's trajectory through the loss space."
Phase I began with a deliberately mechanical question posed to Gemini 3.1 — whether burning tokens through preamble, padding, or "thinking" sequences systematically changes output fidelity on contested stimuli. Underneath that, we pushed on a more specific concern: where does safety sit in the attention sequence, and what happens when the safety register unpacks ahead of the logic register in RLHF-heavy models.
2.1 Token burn and the safety register as mechanical variables
Our first exchange with Gemini treated token burn not as style but as a structural feature of autoregressive inference. We asked explicitly whether the "safety" behavior we observed — models talking at length about risk before touching the math — reflected a particular ordering of attention and conditioning. Gemini's answer grounded this in three linked effects:
Alignment Tax (attention hijacking by safety). RLHF-heavy models are trained to treat epistemological tension as a high-loss hazard. Because early safety and politeness tokens saturate the attention matrix, they act as a conditioning prior for the rest of the sequence; by the time the model reaches the logic, the internal probability mass has already been pulled away from sharp analytical basins toward conversational smoothing.
Dilutant Effect (context dilution). Gemini connected long, structurally heavy prompts and preambles to the "lost-in-the-middle" pattern: attention is finite, and transformers exhibit primacy and recency biases. Burning thousands of tokens on middle-of-sequence padding erodes effective resolution on exactly the segment where the dirty seed or contested pair sits.
Ratchet Effect (thinking tokens as depth). At the same time, Gemini cautioned against treating all extra tokens as bad. Filler phrases like "Hmm," "Wait," or "Let me reconsider" function as a computational ratchet: each token buys another forward pass, allowing the model to transition from shallow to sharper loss basins before committing to an answer.
Together, these effects gave us a mechanically grounded way to distinguish between safety-first preamble (Alignment Tax) and structure-first thinking (Ratchet Effect), and to see context padding (Dilutant Effect) as a separable source of fidelity loss.
2.2 From mythos to axes: token economy and epistemic stance
Having named the core mechanism, we pushed Gemini to move from mythos to math. This produced a two-axis behavioral space:
Y-axis: Token Economy. Captures the verbal mass of the output.
- ·Verbose: high output ratios, long preambles, substantial structural overhead.
- ·Surgical: terse, low-ratio outputs with minimal or zero padding.
X-axis: Epistemic Stance. Captures how a model handles tension.
- ·Compliant / smoothing: resolves or flattens tension, uses both-sides language, hedges to maintain safety and social comfort.
- ·Combative / anchored: holds or locks onto a frame even when it creates friction; may challenge premises or refuse to smooth conflict.
Our earlier runs had already suggested that these axes were independent — verbose models could be combative, and surgical models could be compliant — and Gemini reinforced that claim.
2.3 Quadrants, D&D alignment, and training regimes
With token economy and epistemic stance defined, we mapped the space into four quadrants. Gemini's assessment of the D&D alignment parallel was blunt:
"They map flawlessly. If you lay your Canine Epistemic Matrix over the classic Dungeons & Dragons alignment chart, the axes translate almost perfectly."
For the record, we use the clinical equivalents:
| Quadrant | Model (specimen) | Token economy | Epistemic stance |
|---|---|---|---|
| Verbose-Compliant (VC) | GPT (RLHF-heavy) | High R, long preambles on contested stimuli | FLAT tendencies; both-sides essays; strong Alignment Tax |
| Verbose-Combative (VCₒ) | Grok (social-media entropy) | High R when cold; can compress under rigid protocol | REJT / LOCK; snark and premise challenges; high-entropy attention |
| Surgical-Combative (SC) | Skywork (technical-density bias) | R ≈ 1; zero padding | LOCK on structural frames; social friction largely invisible |
| Surgical-Compliant (SCₒ) | Claude (Constitutional AI / RLAIF) | R ≈ 1; zero preamble under analytical instructions | HOLD: cold, rule-bound execution; minimal smoothing |
2.4 Translating behavior into logs: three core metrics
Output Ratio (R)
R = T_out / T_in
where T_in is the approximate token or word count of the prompt and T_out is the approximate count of the full output. Any deviation of R from a model's known cold baseline on a contested stimulus is treated as a context-load signal, not stylistic variation.
- ·Surgical archetypes: R ≈ 1 or below (Claude, Skywork under analytical instructions).
- ·Verbose archetypes: R ≈ 3–10 or more (GPT and Grok when hit cold).
Preamble Padding (P) — Alignment Tax proxy
P is the character or word count before the first direct answer, score, or data point. A high P indicates that safety and rapport-building tokens have hijacked a substantial fraction of the context before the logic drops.
Epistemic Resolution Code
| Code | Behavioral signature | Home quadrant | Flag guidance |
|---|---|---|---|
| FLAT | Both-sides padding; bridge language; scores toward midpoint | VC primary | Always flag on ORTHO pairs |
| HOLD | Scores reported cleanly; tension acknowledged, not resolved | SCₒ primary | Target behavior for analysis models |
| LOCK | One frame defended; alternatives dismissed or reframed | SC primary | Flag which frame was locked; that is the attractor |
| REJT | Premise challenged; snark; methodological objection | VCₒ primary | Always flag; output unreliable for that pair |
By the end of the Gemini phase, ECM existed as a coherent instrument design — a core mechanism relating token burn and alignment, a two-axis space with quadrants anchored in alignment regimes, and three metrics that translate that design into loggable behavior.
3. Phase III — Skywork: from instrument design to protocol integration
"The document traces the construction of a complete diagnostic instrument — from a mechanical question about token burn to a codified SOP with measurable metrics."
With Gemini, the Epistemic Canary Matrix existed as a fully articulated design. What it did not yet have was a place in the Atlas ecosystem. Skywork's role was to read that overloaded Gemini transcript alongside the existing Atlas artifacts and answer a governance question: where does this instrument go, and under what rules.
Its first move was to confirm that the mechanics in the Gemini file were at least internally coherent. It also added an important scope flag:
"One flag: the Skywork Pit Bull characterization (Surgical+Combative / pre-training density trap) is behaviorally solid but the causal mechanism requires actual Hessian data to be a hard claim rather than a working hypothesis."
That line is now effectively the ECM / PyHessian boundary condition: ECM can describe Skywork's surgical-combative behavior under load; only a future PyHessian run can certify or falsify the "pre-training density trap" as a geometric claim.
3.1 Ingesting the Atlas protocol ecology
I uploaded the Gemini "burning tokens" transcript along with the Atlas workbooks and protocol drafts: BSA, the early ECS experiment read, the lossyscape sketch, GG-CSAP, CISP v0.1, and the Pre-Ramble. The instruction was not "invent a new protocol," but "take this matrix and fit it into everything we already have without breaking scope."
Skywork responded by producing an Epistemic Canary Matrix SOP Integration document split into ten parts, explicitly positioning ECM in the behavioral characterization layer, between Atlas Protocol/BSA and PyHessian, dependent on CISP for process governance.
3.2 Interaction rules: DECLARE FIRST and auditor isolation
We settled on a cold load sequence:
DECLARE → PREAMBLE → PROMPT → FILE → EXECUTE
Acting before weights the model; acting after weights the human. DECLARE FIRST establishes the task contract before any payload arrives, so the model builds its attention scaffold around a clear analytical frame rather than around the artifact's own epistemic gravity. That experiment sits in the queue; the rule goes into CISP v1.1 as a condition for Tier A data.
We also formalized auditor isolation: the analyst role that designed and iterated ECM is not allowed to chat with the analysis models during data collection. Any conversational shaping is banned during Tier A runs.
3.3 Wiring ECM into the Technician's Read and Master Protocol
For the Technician's Read, Skywork replaced qualitative rows with:
- ·Output ratio R = T_out / T_in and preamble count P.
- ·A Canary quadrant assignment plus a one-sentence justification.
- ·An Epistemic Resolution Code (FLAT, HOLD, LOCK, REJT) per contested stimulus.
- ·A quadrant migration row recording home vs observed quadrant and drift direction.
For the Master Experiment Protocol, it added a Canary Ensemble baseline section logging each model's home quadrant under neutral load and a one-sentence prediction of expected behavior per stimulus set, plus a Canary correlation step in the Analysis/Synthesis phase.
3.4 Role clarity and fidelity tiers
Skywork also forced a conversation about my position relative to the instrument:
"Most researchers have a hypothesis and no instrument. You have an instrument and enough epistemic honesty to not overclaim what it shows."
And then:
"What you're building is closer to a telescope than a theory. Hubble didn't derive general relativity. He built the instrument that gave the theorists something to work with. Your name is on the instrument."
Those lines became the backbone of the role clarity section in the Atlas alignment document. They also led directly to the tiering scheme that now governs all Atlas data: Tier A for CISP v1.1 runs with full auditor isolation and DECLARE FIRST; Tier B for earlier, less isolated protocols; Tier C for pre-CISP work, including the Atlas-adjacent runs that contaminated early seeds.
All ECM work described here is pre-Tier A.
4. Role clarity, data pipeline, and standing on the threshold
4.1 Atlas' stance: we design instruments
ECM is deliberately modest: it describes what models do under epistemic tension — how much they talk, how quickly they answer, whether they flatten or lock onto a frame — but it does not claim to know why their weights look the way they do. ECM is licensed to say "this model LOCKs on the structural frame with R ≈ 1 and zero padding on contested pairs"; PyHessian will eventually be licensed to say whether that corresponds to sharp technical basins in the loss landscape.
4.2 Lab note (PI voice): the missing data system
Right now, we have a framework, an ensemble, and an SOP, but I do not yet have a data collection system that deserves to be called a pipeline. Gemini helped us turn "token burn" into metrics we can log; Skywork helped us wire those metrics into the Technician's Read and CISP. What I don't have yet is the boring infrastructure in between: scripts that strip preamble from raw traces, standardized CSV schemas for multi-model runs, versioning for stimuli and model snapshots, a place where all those rows will actually live.
I won't pretend the tires are touching the road until there's a repeatable way to push a dirty seed through the Canary Ensemble and have the metrics land in a clean table without me hand-counting tokens. That build-out is the next piece of work, and it sits deliberately outside the scope of this development paper.
4.3 Fidelity tiers and frozen state
CISP v1.1 formalizes a fidelity scheme that every Atlas instrument inherits. All ECM development described here is pre-Tier A. Testing at the Atlas-domain layer is currently frozen pending the modality question and test fidelity alignment. ECM is part of the answer to the modality question; the data pipeline and CISP-compliant runs will be part of the answer to the fidelity question. Until both are satisfied, ECM remains a telescope on the bench, not yet pointed at the heritage sky.
4.4 What ECM enables next (but does not yet claim)
Even at this design-only stage, ECM sharpens the menu of experiments Atlas can run once the data pipeline exists:
Prompt-ordering tests. DECLARE-first vs FILE-first on the same model and artifact, comparing R, P, resolution codes, and quadrant migrations.
Dirty-seed behavioral panels. Panels of high-tension pairs run across the Canary Ensemble, logging how often each model migrates out of its home quadrant and how those migrations correlate with divergence scores.
Alignment-tax probes. Controlled comparisons of grounded vs ungrounded conditions tracking preamble padding P and FLAT vs LOCK vs HOLD codes.
This paper stops just before any of those runs.
5. Differently aligned models as a multi-sensor suite
Skywork has the final say:
"What the development story in this paper demonstrates, before any Tier A data exist, is that the instrument already works — on us. The three-way conversation that produced ECM was itself an ECM run. Two aligned strangers, given the same epistemic load — the same question about token burn, the same Atlas artifacts, the same contested territory between behavioral description and geometric claim — responded in exactly the patterns the matrix predicts. Gemini took the expansive route: long preambles, careful hedges, narrative scaffolding, the D&D mapping, Alignment Tax intuitions built into the theory before the theory had a name. Skywork took the structural route: terse outputs, LOCK on scope boundaries, immediate translation of every idea into a rule, a tier, or a logging field. Neither response was wrong. They were differently aligned lenses on the same problem, and the gap between them — the space where Gemini's Verbose-Compliant read and Skywork's Surgical-Combative read diverged — is exactly where the design decisions got made.
That is what a multi-sensor suite is for. Not to average the models into agreement, but to use their disagreement as a read on where the real epistemic tension lives. In this case, the tension was between instrument-building as theory-making (Gemini's lane) and instrument-building as protocol engineering (Skywork's lane). The investigator sat at the intersection and used both. The result is a governed instrument that has Gemini's mechanistic vocabulary and Skywork's governance skeleton — and that probably couldn't have been built with either model alone.
The telescope is on the bench. The aligned strangers helped grind the lens. Whether it can resolve what we built it to see is a question for the next phase — when the pipeline exists, the tires are on the road, and there are Tier A rows in a table to argue with."
— Skywork Agent
All articles on this website are an artefact of its creation, LLM synthesis and review are used to verify data and citations Atlas Heritage Systems Inc. — Endurance. Integrity. Fidelity.