Framework
Loss Landscape Vocabulary Framework
v13 · April 2026 · Atlas Heritage Systems · Working document — not a finished product
A note before the math
You don't need to understand any of this to read the Framework. But if you want to know why the Framework is built the way it is, the math is where the answer lives.
When a language model trains, it moves through a mathematical landscape — hills, valleys, flat plains — searching for the lowest point. The vocabulary on this page describes the features of that terrain: what makes one region harder to cross than another, what gets preserved in the difficult parts, and what gets smoothed away in the easy ones.
The archaeological claim Atlas makes is simple: the hard parts leave marks. Those marks are readable. That's what the instruments are built to find.
Start with the plain language description of each term. Follow the math when you need it.
How it all fits together
The Framework names the terrain. The instruments measure behavior on it. The schema defines how measurements get recorded. The protocols govern how they're taken — CISP is the governance layer that sits above every active instrument run, enforcing isolation, sequencing, and the human-judgment boundary.
Below the protocols, the automation layer handles transcription: parsing raw model output, computing what can be computed, and leaving blank what requires a Technician's call. Below that is the data the instruments produce over time — the actual record Atlas is building.
The geometry sits at the end of the chain. PyHessian doesn't measure behavior; it measures the mathematical terrain the Framework describes. When there's enough data, the Hessian eigenvalue analysis will either confirm the Framework's terrain claims or force a revision. Working hypotheses stay hypotheses until the math has something to argue with.
Navigator Properties
The model's dynamic relationship to terrain — how it moves through, resists, accumulates history, and distributes probability mass. Conjugate to terrain properties: precise measurement of one axis structurally degrades precision in the other. Note: Skywork adversarial review identified that the seven qualifiers collapse to three independent variables (density, coupling, elasticity) plus four derived readouts (perplexity, probability, viscosity, memory).
Training data coverage across input space. Coarse means sparse coverage, weak gradient signal. Fine means dense coverage, steep well-defined valleys.
Kullback & Leibler (1951)
Average surprise. Exponential of cross-entropy. High perplexity marks unmapped or contested terrain. The scar tissue of turbulent training lives in high-perplexity regions of a deployed model.
Shannon (1948); Manning & Schütze (1999) ch.3
Output distribution sharpness at inference. High probability outputs correspond to sharp narrow valleys. Low probability outputs correspond to flat regions or saddle points.
Bishop (2006) pattern recognition ch.4
Inter-parameter dependency. How much moving one weight moves others. High coupling means parameter updates propagate widely. Causally determines viscosity via eigenvalue calculation.
Sagun et al. (2017); Dauphin et al. (2014)
Resistance to movement under gradient pressure. Icky means high resistance — flat wide minima, competing orientations persist longer. Determined by coupling via eigenvalue spectrum.
Keskar et al. (2016); Foret et al. (2020) sharpness-aware minimization
Restoring force toward prior weight configurations after perturbation. Catastrophic forgetting is total loss of elasticity.
Kirkpatrick et al. (2017); Krogh & Hertz (1992) weight decay
Path dependency encoded in weights. Not the weights themselves — the history of how the model traveled through the loss landscape during training. Cannot be measured independently from viscosity in a frozen deployed model.
Li et al. (2018); Goodfellow et al. (2014)