Loss Landscape Vocabulary Framework

v13 · April 2026 · Atlas Heritage Systems · Working document — not a finished product

A note before the math

You don't need to understand any of this to read the Framework. But if you want to know why the Framework is built the way it is, the math is where the answer lives.

When a language model trains, it moves through a mathematical landscape — hills, valleys, flat plains — searching for the lowest point. The vocabulary on this page describes the features of that terrain: what makes one region harder to cross than another, what gets preserved in the difficult parts, and what gets smoothed away in the easy ones.

The archaeological claim Atlas makes is simple: the hard parts leave marks. Those marks are readable. That's what the instruments are built to find.

Start with the plain language description of each term. Follow the math when you need it.

How it all fits together

The Framework names the terrain. The instruments measure behavior on it. The schema defines how measurements get recorded. The protocols govern how they're taken — CISP is the governance layer that sits above every active instrument run, enforcing isolation, sequencing, and the human-judgment boundary.

Below the protocols, the automation layer handles transcription: parsing raw model output, computing what can be computed, and leaving blank what requires a Technician's call. Below that is the data the instruments produce over time — the actual record Atlas is building.

The geometry sits at the end of the chain. PyHessian doesn't measure behavior; it measures the mathematical terrain the Framework describes. When there's enough data, the Hessian eigenvalue analysis will either confirm the Framework's terrain claims or force a revision. Working hypotheses stay hypotheses until the math has something to argue with.

Macro-Topology Shapes

Large-scale terrain features produced by the interaction of terrain properties across parameter space. Determines where models converge, stall, or drift. The archaeological signal Atlas seeks lives in specific topological regions.

Bowlsymmetric minimum

Inward gradients from all directions. Single smooth minimum. Stable convergence. Produces laminar flow. Archaeological signal absent — remagnetization likely complete.

Goodfellow et al. (2014) neural network optimization problems

Valleyelongated minimum

Two-sided inward gradients, flat floor. Stall-prone along valley axis. Flat minima generalize better than sharp ones.

Keskar et al. (2016); Izmailov et al. (2018) stochastic weight averaging

Saddle Pointdownhill / uphill

Downhill in some parameter directions, uphill in others. High-perplexity, turbulent. Primary archaeological territory — competing orientations were never resolved into clean convergence here.

Dauphin et al. (2014) saddle point problem in high-dimensional optimization

Plateaustalled

Near-zero gradient everywhere. No signal. Training dies silently. Highest effective drag. Vanishing gradient problem is plateau behavior.

Glorot & Bengio (2010) vanishing gradient; Ioffe & Szegedy (2015) batch normalization

Ridgebasin boundary

High curvature boundary between basins. Unstable traversal. Which side you fall to matters — determines which attractor captures the model.

Li et al. (2018) initialization and optimization role

Basinwide / narrow

The catchment region around a minimum. Wide basins generalize better. Narrow basins are sensitive to perturbation. The model's training path determines which basin it occupies.

basin width ∝ 1/max(λ_i) of local Hessian

Keskar et al. (2016); Izmailov et al. (2018)