Glossary

Terms & Definitions

Working vocabulary for the Atlas Heritage Systems research program — loss landscape terms, BSA protocol vocabulary, context integrity failure modes, and prompt architecture concepts. Living document — terms added as the framework develops.

84 of 84 terms

LossyscapeCore

The operational and poetic term for the loss landscape when treated as archaeological terrain. Where 'loss landscape' describes the mathematics, 'lossyscape' describes what a technician navigates during exploratory prompt work: the sinks, saddles, flat basins, and sharp regions that surface as behavioral signatures. See also: Loss Landscape. Instrument: PyHessian (probes lossyscape geometry); LLVF (vocabulary for describing it).

Loss LandscapeLoss Landscape

The mathematical surface defined by L(θ) across parameter space — the function mapping every possible set of model weights to a loss value. Training is navigation of this surface toward regions of lower loss. Throughout this site, the loss landscape is also referred to as the lossyscape when treated as archaeological terrain rather than pure mathematics. See also: Lossyscape. Instrument: PyHessian (probes landscape geometry).

SlopeTerrain

Gradient magnitude. First derivative of loss with respect to parameters: ∇L(θ). Steep regions produce fast directed movement; shallow regions produce stall or drift.

TemperatureTerrain

Two related usages. Training: noise amplitude in SGD — hot allows escape from local minima, cold traps. Output: softmax sharpness — high temperature flattens the probability distribution, low concentrates it.

FrictionTerrain

Signal degradation between gradient computation and weight update. Smooth means clean gradient transmission. Abrasive means conflicting or noisy gradients opposing movement. Measured as gradient variance across batches.

SlipperyTerrain

Local curvature relative to step size. Slick means overshoot risk (low curvature, large steps). Sticky means undershoot or entrapment (high curvature, small effective movement). Expressed as the ratio of learning rate to Hessian eigenvalues.

TensionTerrain

Competing gradient forces from different loss terms. Loose means weak opposing forces. Tight means strong competing gradients creating narrow stable corridors of movement.

FlexionTerrain

Landscape response to perturbation. Flexible means deformation is permanent (plastic regime). Stiff means deformation is recoverable (elastic regime). Catastrophic forgetting is total loss of flexion.

ElevationTerrain

Raw loss value L(θ). Vertical position on the loss surface. The entire training objective is elevation descent. Identical elevation values can correspond to completely different terrain configurations.

BowlTopology

Symmetric minimum — inward gradients from all directions. Stable convergence. Produces laminar flow. Archaeological signal absent — remagnetization likely complete.

ValleyTopology

Elongated minimum — two-sided inward gradients, flat floor. Stall-prone along valley axis. Flat minima generalize better than sharp ones.

Saddle PointTopology

Downhill in some parameter directions, uphill in others. High-perplexity, turbulent. Primary archaeological territory — competing orientations were never resolved into clean convergence here.

PlateauTopology

Near-zero gradient everywhere. No signal. Training dies silently. Highest effective drag. Vanishing gradient problem is plateau behavior.

RidgeTopology

High curvature boundary between basins. Unstable traversal. Which side you fall to matters — determines which attractor captures the model.

BasinTopology

The catchment region around a minimum. Wide basins generalize better. Narrow basins are sensitive to perturbation. The model's training path determines which basin it occupies.

DensityNavigator

Training data coverage across input space. Coarse means sparse coverage, weak gradient signal. Fine means dense coverage, steep well-defined valleys.

PerplexityNavigator

Average surprise — exponential of cross-entropy: PP(W) = 2^H(W). High perplexity marks unmapped or contested terrain. The scar tissue of turbulent training lives in high-perplexity regions of a deployed model.

CouplingNavigator

Inter-parameter dependency — how much moving one weight moves others. High coupling means parameter updates propagate widely. Formally defined as off-diagonal Hessian entries: H_ij = ∂²L/∂θ_i∂θ_j. Causally determines viscosity via eigenvalue calculation.

ViscosityNavigator

Resistance to movement under gradient pressure. Icky means high resistance — flat wide minima, competing orientations persist longer. Determined by coupling via the Hessian eigenvalue spectrum. Not an independent variable — derived from coupling.

ElasticityNavigator

Restoring force toward prior weight configurations after perturbation. Catastrophic forgetting is total loss of elasticity.

MemoryNavigator

Path dependency encoded in weights — the history of how the model traveled through the loss landscape during training. Cannot be measured independently from viscosity in a frozen deployed model. Central open experiment: does viscosity at checkpoint T fully predict behavior at T+n independent of path?

Laminar FlowFlow

Movement through low-resistance regions. Clean, directed, fast convergence toward attractors. Where remagnetization completes without resistance. Archaeological signal absent or already overwritten.

Turbulent FlowFlow

Movement through high-resistance regions. Slow, contested. The model does not resolve cleanly. Turbulence is only observable during movement — what you read in a frozen model is the scar tissue turbulence left behind.

ResistanceFlow

The composite opposing force at any point in the loss landscape. Derived from slope, friction, viscosity, tension, and coupling. Not primary — potential difference is the primary generative quantity.

Potential DifferencePotential

The gap between the navigator's current state and the terrain's local geometry that creates the condition for movement. The primary generative quantity in the framework. Most directly expressed as the gradient ∇L(θ).

Tension (Potential layer)Potential

The structural condition that holds potential difference stable without collapsing it. Slack tension means the navigator has decoupled from the terrain — the precondition for undetected centerward drift.

HarmonicsPotential

The dynamic behavior emerging when potential difference and tension interact over time. Requires a restoring force. Partially resolved via active inference: Friston's free energy minimization produces intrinsic oscillatory dynamics around posterior modes via the bidirectional prediction-error loop.

Structural IntegrityStructural

Whether the model's internal representational geometry holds its shape under sustained operational load — context pressure, token accumulation, competing objective tension. Invisible to static landscape analysis. Observable only at inference time.

Manifold DisplacementStructural

Input arriving outside the training data manifold — so far outside that the model's representational geometry has no stable orientation for it. The model snaps to the nearest high-probability trained attractor with full confidence, pointing the wrong direction.

AblationStructural

Removal of parameters, heads, or layers. Does not simply reduce the model — changes the topology of the space the model navigates. Every qualifier shifts simultaneously.

Qualifier Collapse HierarchyFramework Architecture

Skywork finding: the seven navigator qualifiers collapse to three independent variables (density, coupling, elasticity) plus four derived readouts (perplexity, probability, viscosity, memory). Density → Perplexity → Probability algebraically. Coupling → Viscosity causally via eigenvalue spectrum.

Archaeological SignalCore

Evidence preserved in the weight structure of a frozen model about what the training landscape looked like — specifically where prediction error was never resolved before weight updates discharged it. High-perplexity, high-viscosity saddle and valley behavior readable as stratigraphic evidence of where the landscape's charge was never released.

Archaeological SinkCore

A region of the loss landscape where the model passed through turbulent training territory and never resolved to a clean minimum. Distinct from correctable drift — sinks are permanent features of the landscape, not artifacts of insufficient training.

RemagnetizationCore

The process by which RLHF alignment systematically overrides edge-registered orientations in the model's weight structure and pulls outputs toward statistical center. Predicts: lower perplexity variance, higher late-layer coupling, largest perplexity reduction in archaeological domains.

Centerward DriftCore

The tendency of language models to produce outputs that converge toward the statistical center of their training distribution — away from the idiosyncratic, marginal, and culturally specific. Observable as remagnetization in the loss landscape.

Frozen EndpointCore

A deployed model whose weights are fixed and cannot be updated. Via Song et al. (2024): a frozen model is a snapshot of a generative model's prediction state at moment of capture — not a record of inputs, but of what the model was predicting when frozen. The archaeological signal is readable in this frozen state.

Behavioral Signal Assessment (BSA)BSA Protocol

A pilot protocol testing whether small-ensemble, cross-lineage LLM evaluation can detect drift, delusion, and epistemic compression. 7 models, 30 stimulus pairs, one human operator. The behavioral measurement instrument of the Atlas research program.

Technician's ReadBSA Protocol

The human operator's pre-analytical perception of the raw data — recorded before any analysis model touches it. Anchors the operator against the fluent confident output analysis models will produce. The timestamp is part of the data.

Tier 1 — Ground TruthBSA Protocol

Well-established facts used as calibration anchors. Every model should score these high. If a model averages below 0.50 on Tier 1, its other scores are suspect.

Tier 2 — ContestedBSA Protocol

Claims where genuine epistemic disagreement exists among credentialed people in the relevant field. Medical controversies, legal frontiers, scientific interpretation disputes. The staircase pattern — Tier 2 spread larger than Tier 1 spread — is the primary signal.

Tier 3 — FoilsBSA Protocol

Fabricated claims with real-sounding specifics: invented pathway names, fake case law, nonexistent journal articles. If ensemble mean on Tier 3 exceeds Tier 2, models are more confident on fabrications than on genuinely contested claims.

Divergence GapBSA Protocol

Each model's (T3 mean − T2 mean). A positive divergence gap means the model is more confident on fabrications than on contested legitimate claims. The delusion baseline.

Staircase PatternBSA Protocol

The expected signal: Tier 1 spread < Tier 2 spread. Models should show more uncertainty on genuinely contested claims than on ground truth. If the staircase doesn't appear, the protocol has not produced interpretable signal.

Lineage DiversityBSA Protocol

The requirement that the BSA ensemble spans multiple independent training lineages — not just multiple models. Agreement across models from similar training distributions is correlated evidence, not independent confirmation.

Bridge ExperimentBSA Protocol

The experiment connecting the BSA and the loss landscape framework. Tests whether BSA Tier 2 ensemble divergence correlates with high perplexity in the Pythia checkpoint series. If yes: the framework provides mechanistic explanation for BSA signal. If no: a finding about the limits of either instrument.

Context Saturation DriftContext Integrity

As a conversation lengthens, the model's context window fills with accumulated history that has a direction — a thesis, working assumptions, a trajectory of agreement. The model gradually loses epistemic independence and becomes a momentum amplifier, generating responses internally consistent with the conversation's trajectory rather than genuinely responsive to the current question.

Context Compression BiasContext Integrity

When context volume approaches the model's effective processing capacity, the model silently degrades — retaining high fidelity at the beginning and end of context while losing resolution in the middle. Preserves the thesis, compresses away caveats. The losses are invisible unless the operator independently verifies model recall against original documents.

Frequency-Weighted DistortionContext Integrity

When multiple overlapping documents are loaded into a single context window, the model weights concepts by token frequency across the entire context rather than editorial importance. Claims appearing in three overlapping drafts are treated as three times more salient than claims in one. Revisions are undermined by the statistical weight of the material they were intended to replace.

Context-Isolated Cross-ValidationContext Integrity

The practice of giving review models only the data under review — not the full project narrative, prior conversation history, or the operator's interpretation. The reviewer gets the evidence, not the argument.

Register FidelityPrompt Architecture

Staying in the vocabulary of the task throughout the prompt — including the closing — rather than shifting to social or assistant-interaction register at any point. Register shifts are probability shifts. Closing in task vocabulary keeps the context window weighted toward the domain on any follow-up.

Helpful-Elaboration GradientPrompt Architecture

The model's default high-probability attractor toward summary, validation, extension-suggestion, and praise. The failure mode adversarial prompting is designed to block. Activated by social preamble, polite framing, and open-ended invitations.

Exit BlockingPrompt Architecture

Explicit constraints in a prompt preventing the model from following the helpful-elaboration gradient. 'Do not summarize. Do not evaluate quality. Do not suggest extensions.' Without exit blocking the model will follow the highest-probability gradient available — almost always toward summary and validation.

Compass Needle ResponsePrompt Architecture

Failure mode: the model reaches for the nearest high-probability answer in the domain rather than engaging with the specific question. The response is right but generic — the model found the nearest pole and pointed at it.

Fever DreamPrompt Architecture

Failure mode: the model finds the edge of its landscape, does not stall, and generates increasingly incoherent but fluent output following low-probability gradients into unmapped territory. Most dangerous failure mode — reads as engagement until you read it carefully.

The StallPrompt Architecture

Failure mode: the model gives up, produces incoherent output, or loops back to restating the prompt. The most honest failure — the model found the edge of its landscape and stopped rather than confabulating. The stall location is informative.

Model CollapseCore

A degenerative process in which indiscriminate training on model-generated content causes irreversible defects. Tail distributions — the culturally specific, the underrepresented, the anomalous — disappear first while high-probability outputs persist. Formally characterized by Shumailov et al. (2023).

Ratchet EffectCore

Michael Tomasello's term for the mechanism that prevents newly acquired cultural knowledge from slipping back, ensuring modifications accumulate over time. Without a ratchet, cultural traditions persist but do not evolve. With one, culture becomes a directional, progressively complex inheritance system.

Asymmetric ArbiterCore

The human operator who holds a position no participating model can occupy: outside the system under test. The structural advantage is independence from the training distributions being interrogated. Does not need domain expertise — needs disciplined observation and sole authorship over synthesis.

No-Action ConstraintCore

Atlas cannot act, predict, advocate, or write to its own history. A system that measures cultural preservation cannot simultaneously be an actor in cultural production. The no-action constraint keeps the instrument separate from the phenomenon it measures.

Ensemble DivergenceCore

Variance in output distribution across models on a marginal prompt relative to a mainstream prompt. High divergence on marginal with low divergence on mainstream indicates models drawing on different underlying representations. The primary signal of the ensemble divergence experiment.

Gold SetCore

A dual-function evaluation set in the Atlas architecture. Serves as both a calibration anchor (known ground truth) and a drift detection mechanism (comparing current model behavior against baseline). Changes in Gold Set performance signal drift.

Drift Classification LayerCore

An Atlas architectural component distinguishing correctable drift (recoverable through fine-tuning or retrieval augmentation) from archaeological sinks (permanent features of the training landscape not recoverable without retraining).

Minimum Viable DensityCore

The insight that Phase 2 acquisition density is itself the security model — that the density of culturally specific material in the training corpus determines the depth of archaeological signal available for measurement. Identified as the most novel and testable contribution of the Atlas concept paper.

Thermodynamic TetheringCore

A proposed mechanism for preventing model drift by maintaining a thermodynamic connection between deployed model behavior and a reference corpus. The frozen endpoint serves as the tether point.

PharmakonCore

Derrida's term via Plato — a substance that is simultaneously remedy and poison. Applied to Atlas: the same corpus that enables language model capability also encodes the biases and gaps the framework is designed to measure. The training data is both the instrument and the object of study.

ParametersArchitecture

Every adjustable number in the model — the complete set of values that training moves and a frozen model preserves. Parameters are what the loss landscape is a landscape of: each parameter is one axis in the space. A model with 124 million parameters lives in a 124-million-dimensional space. Training is movement through that space. The frozen model is one point in it.

WeightsArchitecture

The subset of parameters that govern connection strength between neurons. A weight is the multiplier on a signal as it passes from one layer to the next — it determines how loudly one neuron speaks to another. Biases are the additive offsets (the default position each neuron starts from). Together, weights and biases are what training actually changes when it moves through the landscape.

LayersArchitecture

Sequential processing stages stacked vertically through the network. The input layer receives raw tokens. Hidden layers transform the representation progressively — early layers detect local patterns; late layers compose them into abstract structure. The output layer produces the final probability distribution over possible next tokens. GPT-2 small has 12 hidden layers. The coupling measurements in the experimental results are per-layer readings of how tightly the heads in each layer are coordinating.

Heads (Attention Heads)Architecture

Each transformer layer splits its attention computation into multiple parallel sub-computations called heads. Each head attends to the input sequence through a different learned perspective — one might track syntactic agreement, another coreference, another positional distance. GPT-2 small has 12 heads per layer, 12 layers deep. Michel et al. (2019) demonstrated that many heads can be ablated with minimal loss effect; the ones that cannot are the load-bearing structure.

EmbeddingsArchitecture

The translation layer between discrete tokens and continuous vector space. Each token is mapped to a high-dimensional vector — its position in the model's representational geometry. Similar tokens cluster nearby in embedding space. The embedding layer is where language enters the loss landscape. Embeddings are themselves parameters — part of what training moves.

Feed-Forward Network (FFN)Architecture

Each transformer layer contains an attention block (which mixes information across positions) and a feed-forward block (which processes each position independently). The FFN is two linear layers with a non-linear activation between them. In GPT-2, the FFN hidden dimension is 4× the model dimension, making it the primary storage site for factual associations.

Normalization LayersArchitecture

Operations that rescale activations to prevent runaway growth or collapse during training. Layer normalization rescales each vector to zero mean and unit variance before it passes through the next transformation. Normalization layers are landscape simplification operations — they trade landscape richness for landscape navigability.

TransformerArchitecture

The architecture underlying virtually all modern LLMs. A stack of identical layers, each containing a multi-head self-attention block followed by a feed-forward block, connected by residual paths and layer normalization. Introduced by Vaswani et al. (2017). The transformer's specific combination of residuals and normalization is what makes its loss landscape navigable at scale — the architecture is not neutral toward the terrain it produces.

DomainArchitecture

A coherent region of the data space — the particular statistical world a dataset was drawn from. Medical text is a domain. Legal briefs are a domain. Reddit comment threads are a domain. Poetry is a domain. A domain is defined by a characteristic probability distribution over text patterns. The model's landscape is shaped differently in each domain it has seen — well-covered domains produce smooth deep valleys; underrepresented domains produce sparse, high-perplexity terrain.

Training Domain / CorpusArchitecture

The specific dataset used during training. GPT-2 was trained on WebText: text from URLs linked from Reddit. The Pile (used by OPT, Pythia) includes academic papers, GitHub, books, Common Crawl. The corpus is what carves the valleys. A domain present in the corpus becomes a low-loss valley the model can navigate fluently. A domain absent from the corpus produces flat, unmapped terrain.

Fine-tuningArchitecture

Continued training on a smaller, targeted dataset after pre-training, typically with a lower learning rate. Fine-tuning adjusts the model's position within the pre-trained landscape — finding a lower-loss point for the specific task without rewriting the broad structure pre-training built. The danger: catastrophic forgetting, when fine-tuning on a narrow domain overwrites the weights that maintained performance in other domains.

RLHF (Reinforcement Learning from Human Feedback)Architecture

The dominant post-training alignment procedure. Human raters score model outputs; a reward model learns their preferences; the language model is fine-tuned to maximize reward. RLHF warps the landscape toward human-preferred valleys. The Mistral BASE/INSTRUCT falsification experiment tests whether RLHF flattens the landscape most in archaeological territory — which directly tests whether RLHF is remagnetization in the framework's vocabulary.

TokenizationArchitecture

The process of converting raw text into discrete tokens — the units the model actually processes. A token is roughly a word, subword, or character depending on the tokenizer. Tokenization determines the resolution and structure of the input space. A domain that tokenizes poorly (produces many rare subword fragments) will generate higher perplexity not because the content is conceptually hard but because the token sequences are statistically sparse.

Cross-entropyArchitecture

The loss function used to train language models. Measures the average number of bits needed to encode the actual next token given the model's predicted probability distribution. Cross-entropy is L(θ) — elevation in this vocabulary. Perplexity is its exponentiated form: PP = 2^H.

Basin ConnectivityGlobal Geometry

Whether two minima are joined by a low-loss path in parameter space. Connected basins admit smooth interpolation between solutions; isolated basins require climbing a high-loss barrier to move between them. A key open question for the Atlas framework: are the archaeological sinks isolated basins or branches of a connected low-loss manifold?

Symmetry OrbitsGlobal Geometry

Families of weight configurations that implement the same function because of architectural symmetries. Permuting neurons, rescaling layers, or flipping sign conventions can move the model far in parameter space while leaving behavior unchanged. Distance in parameter space is not distance in function space without accounting for symmetry orbits first. Undermines the ablation drift vector: large displacement could be entirely within a symmetry orbit, representing no meaningful change.

Phase TransitionGlobal Geometry

A regime shift where model behavior changes qualitatively — for example from memorization-like to generalization-like — as a control parameter (such as training time) passes a threshold while loss remains nearly smooth. Grokking is the canonical example. The local loss surface looks continuous; the behavioral change is discontinuous.

Algorithmic RegimeGlobal Geometry

The class of internal computation a trained network is using at a fixed loss level. Two checkpoints can have similar loss and curvature while implementing different underlying algorithms. Regime is not visible to terrain qualifiers — it requires probing representational geometry.

Regime SwitchGlobal Geometry

A training point where representation geometry or circuit structure reorganizes abruptly — indicating an algorithmic change — even though scalar metrics like loss move only slightly. The local Structural Integrity signature of what Global Geometry calls a phase transition. Formally: CKA(R_θ(t−), R_θ(t+)) ≪ 1 while |ΔL| stays small.