Loss Landscape Vocabulary Framework

v13 · April 2026 · Atlas Heritage Systems · Working document — not a finished product

A note before the math

You don't need to understand any of this to read the Framework. But if you want to know why the Framework is built the way it is, the math is where the answer lives.

When a language model trains, it moves through a mathematical landscape — hills, valleys, flat plains — searching for the lowest point. The vocabulary on this page describes the features of that terrain: what makes one region harder to cross than another, what gets preserved in the difficult parts, and what gets smoothed away in the easy ones.

The archaeological claim Atlas makes is simple: the hard parts leave marks. Those marks are readable. That's what the instruments are built to find.

Start with the plain language description of each term. Follow the math when you need it.

How it all fits together

The Framework names the terrain. The instruments measure behavior on it. The schema defines how measurements get recorded. The protocols govern how they're taken — CISP is the governance layer that sits above every active instrument run, enforcing isolation, sequencing, and the human-judgment boundary.

Below the protocols, the automation layer handles transcription: parsing raw model output, computing what can be computed, and leaving blank what requires a Technician's call. Below that is the data the instruments produce over time — the actual record Atlas is building.

The geometry sits at the end of the chain. PyHessian doesn't measure behavior; it measures the mathematical terrain the Framework describes. When there's enough data, the Hessian eigenvalue analysis will either confirm the Framework's terrain claims or force a revision. Working hypotheses stay hypotheses until the math has something to argue with.

Ablation & the Drift Vector

Ablation removes structural elements — weights, heads, layers — changing the dimensionality of the landscape. Every qualifier shifts simultaneously. The drift vector and drag coefficient delta are proposed metrics with known structural problems identified by Skywork adversarial review.

Ablationwhat changes

Removal of parameters, heads, or layers. Does not simply reduce the model — changes the topology of the space the model navigates. Every qualifier shifts simultaneously.

Michel et al. (2019) sixteen heads are better than one; Brown et al. (2020) GPT-3

Ablation Drift Vectorproposed metric

Displacement of model position in the loss landscape before and after ablation. Identifies what ablated components were doing. Three structural problems identified: dimensionality mismatch, weight-space ≠ functional distance, post-hoc ≠ pre-training absence.

Δθ = θ_post − θ_pre
Flagged: dimensionality mismatch — before and after ablation the model lives in spaces of different dimension. Embedding choice is interpretively load-bearing and unspecified.

Li et al. (2018) visualizing loss landscape; Meyes et al. (2019) ablation studies

Drag Coefficient Deltaproposed metric

Change in the resistance profile caused by ablation. Computationally expensive. Not yet a named method in the literature. Requires full knowledge of all terrain and navigator properties pre- and post-ablation.

Δb(θ) = b_post(θ) − b_pre(θ)
Proposed — mathematical grounding neighbors but does not fully support this application

Meng et al. (2022) ROME; Olah et al. (2020) circuits

Priority experiment: PyHessian on GPT-2 small to test whether the attention correlation probe (coupling measurement) correlates with actual Hessian off-diagonal structure. Until this is done, the Skywork collapse hierarchy finding is unconfirmed empirically.