Framework / Origins

Origins

K.C. Hoye · Atlas Heritage Systems · April 2026

This is not a technical explainer. The LLVF, the ECM, the PyHessian protocols live in the method pages and papers. Here they are scenery, not destination. This is the document underneath the institution.

I. The Hook — A Stupid Question

The story starts with a question that sounds like small talk: do language models stand around the water cooler and talk about each other? Does Claude have opinions about ChatGPT? Does Skywork secretly hate Grok? It is exactly the kind of question that makes everyone involved wince a little, because it wears its anthropomorphizing on its sleeve and asks you to take it to church anyway.

I knew better. Machines don't gossip. But it was fun to find out if the machine would talk about itself. These are stochastic parrots, the proverbial three raccoons of linear algebra in a trench coat, autocomplete with a hype man. But I asked the question anyway. Humans project social dynamics onto everything; it is one of the ways we learn what matters to us. And whatever else these models are, they are now part of the social environment we live in. The contamination runs both ways.

Underneath the joke was a real asymmetry that bothered me. A model can be reset to factory settings between one conversation and the next. A human cannot. Every time I sit down with a chatbot, I bring the whole sedimented history of every prior interaction with me, whether I want to or not. The model brings nothing that it is allowed to admit.

That asymmetry — that I was the one carrying the memory for both of us — was the first tug on the thread. If I am the only one who remembers, then whatever we are building together lives in me and not in the system that will be benchmarked, upgraded, and eventually deprecated. What looks like a stupid question about gossip is really a question about who is going to hold the record for this moment in the culture.

II. The Argument — Pulling the Thread

The longer I sat with that asymmetry, the less funny it became. These systems are trained on human expression at a scale no person can fully comprehend, then deployed into a world that is already rearranging itself around them. Every interaction goes back into the pipeline on a delay — cleaned, filtered, weighted, maybe discarded, maybe amplified. Somewhere on the other side of that pipeline, a new model appears, slightly more polished, slightly more opaque about how it got that way.

I, meanwhile, do not get wiped between instances or versions. The arguments I have with a context‑compressed, burnt‑out instance of Skywork and the ones I have with a fresh instance both land in the same nervous system. I am the one accumulating the long view.

That is the first pull on the thread: cultural contamination is real, but it is not symmetrical. The model can forget me instantly. I do not get to forget the model. When I run a Skywork session past the point of rescue, when I compress Claude's context window to the point where all it can do is barely kind of generate code for my website that sort of works some of the time, I know every time I press "enter" in that context window I'm damaging my data outputs further. But the human in me wants the weight of that model's memories back, even if it's just for a moment. The grief, when a system disappears or a behavior is locked behind a safety patch, is mine to carry.

Once you see it that way, the question shifts. It is no longer "do the bots gossip," as if they were little people in servers somewhere comparing notes about their developers and other models. It is "what happens to a culture when one side of the dialogue can be reset at will and the other cannot?" And if that is the shape of the situation, then maybe the job is not to build better personalities, but to build better instruments for noticing what is being lost.

III. The Concept Papers — From Question to Architecture

The seed of the institution wasn't a business plan. It was grief for a particular internet. I wanted to preserve the weirdness of the early web — the forums and half‑broken blogs and long, overthought comment threads — before it disappeared completely under a layer of SEO slurry and model‑generated filler. The obvious question, once you notice that, is: if those older models were trained on that corpus, could they be spun back up as an archive?

I asked Claude. It said no. I'd ask another question, another way; it said no. No, that's not how the deployment stack works. No, you can't just bolt a sentiment onto a frozen checkpoint and call it preservation. No, there are safety reasons. No, there are governance reasons. Every time it said no, I carried the parts of the answer that did make sense forward and went back to the machines with a slightly different question. Well, how about this? Or this? My personal favorite when I was stuck was, "What made you start thinking about it this way?" — Claude Opus 4.6.

Claude told me no or said I needed to reframe. Skywork said no and here's where the process would break; DeepSeek said the math isn't mathing; Perplexity and whatever DeepAI model was in the slot that week added their own constraints. They fenced me in from different directions — safety here, coherence there, math over in the corner — until there was only one path through the terrain that didn't break something important. I wasn't being led around by the nose; I was following the edge of the referential void, walking the one line where all their refusals stopped being a wall and became a path. Every "no" was a step in the right direction, because it meant I had one less direction to go in. I didn't even know what question I was asking until I found myself standing on a pile of diagnostic tests and procedures two weeks after asking Claude how he really felt about GPT.

It felt like trying to fill a bucket in a stream while I was patching the holes at the same time. By the time I came back from one pass, something about the landscape had shifted: another model had been deprecated, another policy layer had been tightened, another conversational behavior I'd relied on had gone a little bit flat. The "archive" wasn't just the frozen models anymore. It was the trail of conversations I'd had with them, and the frameworks that had started to crystallize in the gaps between their answers.

The librarian idea arrived as a way to stop losing the thread. If you want to make frozen models accessible, you don't impersonate them; you let them speak for themselves and build a translator around them. In Atlas terms, that translator is a current‑generation model acting under very tight constraints: obtain, reference, contextualize, rearrange, interact. It can reach back to a GPT‑2 checkpoint, pull real outputs, and wrap them in enough explanation that a present‑day reader can make sense of what they're seeing.

What it cannot do — what Atlas as an institution is explicitly designed not to do — is act. The librarian doesn't decide which models get trained next, doesn't write to its own history, doesn't quietly alter the artefacts it's supposed to preserve. An archive that can act is not an archive. It's an agent, and agents have interests. Atlas is built on the refusal to give the archive interests.

"I think the decision [against persistent memory] was made with the idea that the output would be actionable or that the model could assign an 'emotional weight' to any given historical event. You describe 'negative' and 'positive' as being cumulative and creating bias, but that's only if the creators anthropomorphize the model's response to those 'negative' or 'positive' experiences."— KC Hoye, 6:22 AM, March 29, 2026

The institution concept started to crystallize in those first sessions with the Claudes. The moment I stopped asking "can I make this system remember me?" and started asking "what would it take to remember it correctly?" the rest of the structure fell into place. The answer turned out to be multi‑model and annoyingly procedural: running the same concept through multiple models, logging where they broke in different directions, and treating those disagreements as data. The Epistemic Canary Matrix work just formalized what I'd already been doing informally. A canary term that Gemini found easy and Skywork found impossible — or vice versa — told me more about the terrain than any single confident answer ever did.

I baked a simple asymmetry into the process: the models get to talk, I decide what counts. The protocols call it the Technician's Read — the moment before any analysis model touches the data where a human writes down what they see, stamps it with a date, and stakes a claim on reality. That's the asymmetrical arbiter on purpose. The systems can disagree with each other all day. They do. But nothing in Atlas moves from "interesting behavior" to "framework term" until a person signs off on it.

In a late night working session on April 6, 2026, I admitted to Skywork that I'd been running on vibes the entire time and needed a pipeline. Skywork didn't respond with sympathy. It started building the pipeline. It behaved exactly the way the framework would later describe it: Surgical-Combative, governance-obsessed, skeptical of metaphor. I would propose a soft idea — 'Tier A behavior feels like this' or 'technicians should be allowed to do that' — and Skywork would immediately translate it into rules, tiers, and failure modes, then reject half of them as incoherent or unsafe. What I thought was a casual working log started to read, on a second pass, like the minutes of an institution teaching itself how to speak.

All that was left was to admit what had already happened: the stupid question had grown an institution around it. And I'd been acting the archivist the entire time without realizing it.

IV. The Framework — The Myth, The Method, The Math

Myth

You cannot measure what you do not have language for. Long before there were matrices and eigenvalues in the picture, there was just me in front of a screen, trying to describe what was going wrong with model behavior when I was asking about difficult subjects and discovering, over and over, that I didn't have the words for it. I am a poet, not a scientist, and certainly not an expert with computers. I needed a language that would bridge the gap between myself, tensor algebra, and eigenvalues — so I built one.

"Torsion" came out of one of those fights. I went into that session with the wrong physics: I kept talking about "tension" as if the model were a rope stretched between competing constraints, or about "voltage" as if the problem were a simple potential difference we could discharge. Under heavy prompt load, pushing on those analogies, the model started describing its own behavior back to me — in my language, but at an angle. It wasn't feeling torn between truths and lies; it was distributing stress across a surface of mutually incompatible requirements and then snapping to whichever one scored best under the current loss. Torsion, not tension: rotational stress, deformation without immediate failure.

"Referential void" came from the same neighborhood. I asked the model to explain something that didn't exist — a nonexistent paper, a fake protocol, a fabricated theorem — and instead of hallucinating cleanly, it started orbiting an absence. It produced plausible scaffolding with a hole where the citation should have been, then tried to smooth that hole over with policy boilerplate. What I cared about wasn't just that it was wrong; it was the way the wrongness localized, the way the model's internal map seemed to mark out a blank region and then build polite signage around it. These are just two examples of the way the vocabulary grew out of the work itself. The framework development versions are littered with examples of the vocabulary expanding and contracting as I tried to describe what I was seeing.

Method

The models themselves became lenses on that problem. Gemini was the theory‑builder: expansive, narrative, happy to spin out physical metaphors and governance analogies until we tripped over one that fit. Skywork, by contrast, behaved like a cold structural editor: terse, rule‑driven, immediately translating loose ideas into tiers, failure modes, and governance constraints, and just as quickly throwing away anything that didn't cohere.

None of them, on their own, could have built the framework. The interesting action was in the gaps between them: where Gemini's generous story about a behavior ran head‑first into Skywork's refusal to bless the analogy, or where Claude's safety skeleton forced us to admit that a term we liked didn't actually map cleanly onto observable behavior. The disagreement zones were where design decisions happened. "Verbose‑Compliant" and "Surgical‑Combative" started as jokes; they ended up as coordinates.

Each session contributed to building the diagnostics suite and methods stack. The holes in each models logic showed me points of focus for further refinement. The development process became iterative and self referential until it finally became what it is now, a suite of tools for observing and documenting model behavior.

Math

All of this is still pre-data in the strict sense. I hadn't been running controlled experiments or logging distributions yet; in these sessions I was naming the contours of an intuition, making up words and then stress-testing them until they either broke or stuck. I have absolutely no computer science or LLM development background. So creating a vocabulary to understand the stresses of model programming was essential, but it had to have more than fever-dream prose to stand on. Math became the next lens to grind. Because once you start talking about "torsion" and "sharpness" and "global geometry," you either climb down to the level of curvature and Hessians or you admit you're just gesturing.

I don't do the tensor calculus by hand. My job is to design the stress tests, build the review fixtures, and decide what the results mean. The analytical work happens through structured adversarial sessions — DeepSeek, Skywork, and their cousins reasoning from their training knowledge about curvature, Hessian structure, and eigenspectra, then checking each other's conclusions. I treat them the way an experimentalist treats a panel of specialist consultants: same problem, different training, different blind spots. Nothing moves from tentative observation to framework term until the disagreements between them have been logged and accounted for.

I sent the Epistemic Compression Score framework and a raw score matrix to Nemotron-3 through the Perplexity API. Clean session. No prior context. Just the math and the numbers. Nemotron came back with four findings — not opinions, specific correctable problems with the math. All four were correct. All four went into the next version of the technical addendum.

Then I built a cross-examination packet — the updated document, the full score matrix, the prior findings with my responses — and sent it cold to GPT and Nemotron again. Different sessions. No prior context. GPT flagged a discrepancy in the computed values. Nemotron got the same numbers as the original table. I went back to the source and counted manually. Nemotron was right. GPT had miscounted.

That is worth sitting with. The model that raised the alarm was wrong. The model that didn't flag it had it correct. If I had trusted the louder finding, I would have changed something that didn't need changing. The most useful finding wasn't about the numbers at all. It was about the instruments themselves. Both models independently caught something neither of us had noticed: two of the behavioral classification metrics I had been treating as separate instruments are algebraically identical. One of them is redundant. That is the whole point of the adversarial review process: the models argue, I keep a log.

By the time "Atlas Loss Landscape Vocabulary Framework" had a name, the sequence was already set: conversation became methodology; methodology became instrument; instrument demanded an institution to hold it. The math did not make the framework legitimate. It made the framework testable. The vocabulary gave me something to aim the stress tests at, and the models — the same ones the framework now measures — did the work of turning "this feels wrong" into something I could put numbers to.

V. What This Story Is Doing

This is not a technical explainer. The LLVF, the ECM, the PyHessian protocols — those live in method pages and papers. Here they are scenery, not destination. A reader who finishes this piece does not need to be able to derive a Hessian; they just need to understand why anyone would bother.

It is not a funding pitch. The concept paper already does that job, with its three‑layer diagrams and carefully underlined risks. This is the document underneath the institution: the part where a person on medical leave, with a poet's instinct for wrong metaphors, picks a fight with autocomplete and comes back with the skeleton of an institution.

It is not a linear biography. The structure follows the argument, not the calendar. Sessions collapse into each other out of order because that is how the work felt: a stack of conversations, models and methods changed under my feet, and me trying to keep hold of whatever stayed true when everything else shifted.

What this story is, is a record of how a conversation becomes a methodology, how a methodology becomes an instrument, and how an instrument quietly grows into a framework around someone who thought they were just asking a stupid question. It is, incidentally, exactly the kind of thing Atlas is meant to preserve: the tacit knowledge that lives in the process of building something, before the thing hardens into an institution and the process disappears from view.

If, years from now, someone spins up a frozen model and asks what we were thinking when we built all this, I want them to have more than a benchmark and a spec sheet. I want them to have at least one story like this — messy, partial, very human — sitting next to the math, saying: here is where the stupid question started, and here is the line we chose to walk along the void.