Framework
Loss Landscape Vocabulary Framework
v13 · April 2026 · Atlas Heritage Systems · Working document — not a finished product
A note before the math
You don't need to understand any of this to read the Framework. But if you want to know why the Framework is built the way it is, the math is where the answer lives.
When a language model trains, it moves through a mathematical landscape — hills, valleys, flat plains — searching for the lowest point. The vocabulary on this page describes the features of that terrain: what makes one region harder to cross than another, what gets preserved in the difficult parts, and what gets smoothed away in the easy ones.
The archaeological claim Atlas makes is simple: the hard parts leave marks. Those marks are readable. That's what the instruments are built to find.
Start with the plain language description of each term. Follow the math when you need it.
How it all fits together
The Framework names the terrain. The instruments measure behavior on it. The schema defines how measurements get recorded. The protocols govern how they're taken — CISP is the governance layer that sits above every active instrument run, enforcing isolation, sequencing, and the human-judgment boundary.
Below the protocols, the automation layer handles transcription: parsing raw model output, computing what can be computed, and leaving blank what requires a Technician's call. Below that is the data the instruments produce over time — the actual record Atlas is building.
The geometry sits at the end of the chain. PyHessian doesn't measure behavior; it measures the mathematical terrain the Framework describes. When there's enough data, the Hessian eigenvalue analysis will either confirm the Framework's terrain claims or force a revision. Working hypotheses stay hypotheses until the math has something to argue with.
Terrain Properties
Properties of the loss surface itself — the fixed mathematical landscape that training navigates. Formally defined by L(θ) and its derivatives across parameter space. Readable only through dynamics — the landscape exists as a mathematical object but is accessible only through probing (movement).
Gradient magnitude. First derivative of loss with respect to parameters. Steep regions produce fast directed movement; shallow regions produce stall or drift.
Cauchy (1847) gradient descent; Rumelhart et al. (1986) backpropagation
Two related usages. Training: noise amplitude in SGD — hot allows escape from local minima, cold traps. Output: softmax sharpness — high temperature flattens probability distribution, low concentrates it.
Hinton et al. (2015) knowledge distillation temperature
Signal degradation between gradient computation and weight update. Smooth means clean gradient transmission. Abrasive means conflicting or noisy gradients opposing movement.
Kingma & Ba (2014) Adam optimizer
Local curvature relative to step size. Slick means overshoot risk (low curvature, large steps). Sticky means undershoot or entrapment (high curvature, small effective movement).
Keskar et al. (2016) sharp minima and generalization
Competing gradient forces from different loss terms. Loose means weak opposing forces. Tight means strong competing gradients creating narrow stable corridors of movement.
Sener & Koltun (2018) multi-task learning
Landscape response to perturbation. Flexible means deformation is permanent (plastic regime). Stiff means deformation is recoverable (elastic regime).
Kirkpatrick et al. (2017) elastic weight consolidation
Raw loss value. Vertical position on the loss surface. The entire training objective is elevation descent. Identical elevation values can correspond to completely different terrain configurations.
Choromanska et al. (2015) loss surface topology