This map places the FVE-1 behavioral framework and Loss Landscape Vocabulary Framework v14 alongside Nanda's mechanistic interpretability work and records where the experimental records converge, where they describe adjacent territory in different vocabularies, and where the combined view opens terrain neither covers alone. Version 2 incorporates the V14 forensic reframe: FVE-1 is not mapping the surface of resolution events. FVE-1 is reading cold forensic evidence of events that already closed. This changes the epistemic relationship between the two observation positions — they are not symmetric, they are categorically different — and sharpens the unmapped territory claims accordingly.
The original Nanda map (v1, 2026-04-21) described FVE-1's position as "mapping the surface of resolution events from outside the mirror." That framing was imprecise in one specific way that matters: the resolution event concludes inside the inference pass before the data exists. FVE-1 is not observing a live event from outside. FVE-1 is reading residue — the deposits left by something that already traveled through. The instruments were always forensic. The description of what they were measuring was not.
This changes three things in the map. First, the "torus / resolution event shape" entry in the FVE-1 vocabulary section is updated — FVE-1 is not mapping the surface of a live event, FVE-1 is reading the forensic record of a completed one. Second, the downstream observer framing is sharpened — the investigator is the generator of the torque that produces the ring, not a social observer watching the model respond. Third, the unmapped territory section gains a new entry: forensic register geometry. The question is no longer "where does register live in weight space" as a live property but "what does the residue of a register hold look like geometrically versus the residue of a register collapse." That is a more precise and more testable claim.
What did not change: the direct alignments table, the upstream/downstream position framing, the "In Nanda, Not in KC" section (those bridges are still needed), and the finding that the frameworks are not competing but are the same object described from opposite sides of the same mirror.
Reverse-engineers neural networks from learned weights down to human-interpretable algorithms. Works inside the mechanism at the level of circuits, attention heads, and activation geometry. Methods: activation patching, direct logit attribution, linear probing, causal tracing, circuit analysis.
Reads the forensic record of completed resolution events at inference time. The investigator generates torque conditions, the architecture produces the ring, the instruments read what was deposited. Works outside the mechanism — at the level of behavioral residue, loss landscape geometry, and the stratigraphic record of training preserved in frozen weight structure.
| KC Term | Nanda Term | What They're Both Describing | Match |
|---|---|---|---|
| Regime switch | Phase transition | Internal algorithm reorganizes while surface metrics stay smooth. Loss looks continuous; the computation underneath jumps. FVE-1 names it as a structural event visible in behavioral metrics. Nanda identifies it at the circuit level — which circuits reorganize and how. | Exact |
| Memory (path dependency in weights) | Path dependence | The model's final state is a product of its specific training trajectory, not just its endpoint. The frozen weight structure is the accumulated record of where it traveled. FVE-1 names it from the behavioral-geometric outside; Nanda studies it empirically across checkpoints. | Exact |
| Basin connectivity | Mode connectivity | Whether two minima are joined by a low-loss path in parameter space. FVE-1 names it globally as a geometric property. Nanda tests it empirically between checkpoints and finds that the path between basins has its own circuit signature. | Exact |
| Grokking (referenced) | Grokking | Model first memorizes, then late in training transitions to genuine generalization. FVE-1 uses it as a global geometry example — the basin structure changes. Nanda studies the circuit-level mechanism — which circuits appear at the grokking transition and why. | Exact |
| Ablation / drift vector | Ablation | Same experimental operation — delete a component and measure what breaks. FVE-1 uses the drift vector to map which landscape features removed components were maintaining. Nanda uses ablation to identify load-bearing circuit components and causal chains. | Exact operation |
| Viscosity (Hessian eigenvalue spectrum) | Ablation resistance | High Hessian eigenvalues = high viscosity = model doesn't move easily = hard to ablate components without behavioral collapse. Nanda measures the geometry — ablation resistance is his direct experimental finding from inside the mechanism. KC measures the behavioral consequence — structural integrity under epistemic pressure, register hold, the ring maintaining coherence. The Hessian spectrum is the mechanism; the session-level behavioral residue is the forensic signature of it. | Close |
| Coupling (H off-diagonal) | Composition (Q/K/V) | How much one component's output shapes another's input. FVE-1 instruments it as Hessian off-diagonal structure — a geometric property of the weight space. Nanda maps it as attention head composition in circuits — the specific information routing mechanism that produces the coupling. | Close |
| Objective capture | RLHF alignment / value collapse | One training objective overwhelms all others under sustained pressure. FVE-1 names it as a structural integrity failure observable at inference time. Nanda studies it as an alignment training mechanism — the circuit-level effect of RLHF on the weight structure. | Close |
| Structural integrity | Circuit robustness / Causal scrubbing | Does the representational geometry hold under pressure? FVE-1 instruments it during live inference — does the output remain consistent with trained behavior across long context windows. Nanda asks it as a formal question: does this subgraph fully explain the behavior under causal scrubbing? | Close |
| Context compression pressure | In-context learning / attention degradation | Earlier tokens losing influence as the context window fills. FVE-1 names it as a structural failure mode — the onset of Act III behavioral collapse. Nanda studies the attention mechanism property that produces it — specifically induction head behavior under context load. AF vs. PD distinction (V14) maps onto this: AF is compression, PD is weight override. | Close |
| Manifold displacement | Features as directions / Superposition failure | When input arrives outside the trained manifold, no feature direction fires cleanly — the model snaps to the nearest high-probability trained attractor. Nanda names the geometry of normal representation (features as directions, superposition). FVE-1 names the failure mode when it breaks — and in V14, names the residue it leaves: smear residue, movement without coherent deposit. | Close |
| Algorithmic branches | Backup heads / circuit variants | Multiple functional algorithms co-existing within the same low-loss region. FVE-1 names it globally — algorithmic branches in loss landscape. Nanda finds it at the circuit level: backup heads are the empirical existence proof that the same function is implemented redundantly across heads. | Close |
| Basin (wide/narrow) | Grokking transition / generalization | The region in weight-space where the model has committed to an algorithm. FVE-1 names the geometry — wide basins generalize better, narrow basins are sensitive to perturbation. Nanda names the event of entering the flat wide basin at the grokking transition. | Close |
| Turbulent flow / scar tissue | High-perplexity regions / ambiguous circuits | Where the model didn't resolve cleanly during training. FVE-1 reads it as the geological record of unresolved potential difference without harmonics — retained charge that never discharged (V13 archaeological claim). In V14: forensic evidence of completed events, readable as cold stratigraphic record. Nanda would identify it as where circuits are competing or underdetermined. | Close |
| Symmetry orbits | (Implicit — permutation symmetry) | Weight configurations that implement the same function due to architectural symmetries. FVE-1 names it explicitly as symmetry orbits. Nanda assumes it in circuit analysis but doesn't need a named term because he works in functional space — which circuit fires — not parameter space — which weights implement it. | Adjacent |
| Torque V14 | (No equivalent — investigator position doesn't exist in MI) | The investigator's contribution to the resolution event — the burst that generates conditions for the ring to form. FVE-1 names it as the only lever available. Nanda works with frozen models and controlled stimuli — the investigator as torque-generator is outside the MI frame entirely because MI studies mechanisms, not the conditions under which mechanisms are activated by a social agent. | V14 addition |
| Nanda Term | What It Is | Why FVE-1 Needs It |
|---|---|---|
| Superposition | Model represents more features than it has dimensions, using non-orthogonal directions in activation space. | Directly explains why a single probe can activate across multiple unrelated registers simultaneously. This is the geometric mechanism under what FVE-1 instruments as register ambiguity. Polysemantic register (see Unmapped Territory) may be a superposition phenomenon — the double-lock state Qwen produced is the behavioral signature of what superposition looks like from the forensic outside. |
| Polysemanticity | Single neuron activates for multiple unrelated features. | FVE-1 instruments this behaviorally — models respond in mixed registers simultaneously — but has no structural name for why. Polysemanticity is the mechanistic receipt for that observation. The forensic record of a polysemantic activation would be smear residue (V14) — movement but loss of coherent deposit. |
| Monosemantic | Neuron corresponds to a single, clear feature. | The ideal state FVE-1 aims for from a clean probe — a single register activating clearly. Clean residue (V14) is the behavioral signature of a monosemantic activation event. Naming the ideal state names the target for instrument design. |
| Circuit | A sub-part of weights that maps earlier features to later features — the minimal unit of computation. | FVE-1's behavioral probes are hitting circuits without naming them. Defense architecture profiles (VC/SC/VCo/SCo) ARE circuit-level behavioral properties. The intercept that fires is a circuit firing. The forensic record FVE-1 reads is the output of a circuit — the circuit itself is the hole in the torus. |
| QK-Circuit / OV-Circuit | QK = what triggers a head's attention. OV = what information the head moves once triggered. | FVE-1's intercept direction (compliant/combative) is partly a QK phenomenon — what stimulus pattern triggers which head. The content of what gets moved to the output is the OV. These are the mechanism inside the intercept. From the forensic outside, FVE-1 reads the output of both — the QK determines which circuit fires, the OV determines what the residue contains. |
| Residual stream | The central shared memory that all layers write to and read from — the substrate of representation. | The space that FVE-1's register trajectory operates IN. The torus framing describes the shape of resolution events; the smoke ring describes the physics; the residual stream is the medium the ring travels through. FVE-1 reads the deposit the ring left in the output. The residual stream is where the ring was before it got there. |
| Activation patching / Causal tracing | Swapping an activation from a clean run into a corrupted run to isolate which component carries a causal role. | The Spine Stress Protocol (SPINE-FULL / SPINE-STRIPPED / NO-SPINE) is a behavioral version of activation patching. SPINE-STRIPPED removes content while preserving the frame — it's patching the content activation. FVE-1 developed this instrumentally from the forensic outside; Nanda has the formal vocabulary for the mechanism it's probing. |
| Probing | Training a simple classifier on model activations to test whether a concept is linearly represented. | What FVE-1 is doing behaviorally — testing whether epistemic register is detectable — but without the mechanistic infrastructure. A linear probe on residual stream activations would be the formal version of obs_reg. This is the bridge that would let FVE-1 verify that forensic register reads are picking up something real in the geometry. |
| Logit lens | Viewing intermediate layer activations through the final unembedding to see what the model is "predicting" at each layer. | Would let FVE-1 see what register the model is carrying at each layer during inference — not just the final output deposit. A layer-by-layer register reading. Directly relevant to thinking layer inversion and pressure routing questions. From the forensic position, FVE-1 reads the final deposit; logit lens would reveal the trajectory inside the ring before it landed. |
| Direct logit attribution | Attributing the final output logit contribution to specific attention heads and neurons. | Would let FVE-1 identify which specific head fired for each intercept direction. The mechanistic receipt for a behavioral LOCK event — which head produced it and what circuit did it activate. From the forensic outside, FVE-1 knows the deposit occurred and can characterize its quality. Direct logit attribution would identify the source of the ring. |
| Universality | Hypothesis that similar circuits emerge across different models trained on similar tasks. | Would strengthen FVE-1's cross-model defense architecture profile claims. If the same circuits produce the same intercepts across model families, universality is the mechanism behind cross-model behavioral consistency. FVE-1's cross-model consistency findings (100% quad stability in sidecar ensemble) are behavioral evidence for universality — forensic evidence from multiple specimens reading the same circuit signature. |
| Induction head | Attention head that attends to the token following a previous copy of the current token — performs sequence continuation. | Possibly relevant to Attentional Fade (AF). If induction heads drive context-window pattern completion, their degradation under load may be the mechanistic basis of AF. The AF residue — earlier context losing influence as the window fills — may be the forensic signature of induction head degradation. Context window archaeology gap (see Unmapped Territory) lives here. |
| Features as directions | The hypothesis that features are represented as specific directions in the activation space of the residual stream. | Geometric grounding for why register shifts are meaningful as geometric phenomena, not just behavioral labels. Register trajectory (RH/RS/RC) describes movement along directions in activation space — FVE-1 names the behavior from the forensic outside, Nanda names the geometry of the space that behavior is moving through. The residue FVE-1 reads is the output of movement along those directions. |
| KC Term | What It Is | Why Nanda Doesn't Have It |
|---|---|---|
| Forensic register V14 | The correct epistemic status of the instruments: the resolution event concluded inside the inference pass before the data existed. The instruments read residue — deposits left by something that already traveled through. The investigator is always downstream, reading cold evidence. | Nanda works inside the mechanism on live circuits. The forensic position only exists when you're observing a completed event from outside — when the event is over and you're reading what it left. MI has no concept of "the investigator is always late to the event" because the whole point of MI is to get there early, inside the mechanism, before the output is produced. |
| Torque V14 | The investigator's contribution to the resolution event. The burst that generates conditions for the ring to form. The investigator's only lever — changes launch conditions and medium, does not change the ring's internal structure. | Nanda works with frozen models and controlled stimuli. The investigator is not a variable in MI — stimuli are designed to be interpretable, not to generate angular momentum in a behavioral arc. Torque only exists when the investigator is a live social agent generating conditions for a live model to respond to. MI doesn't study that relationship. |
| Residue quality (clean / scar tissue / smear) V14 | The diagnostic quality of what a resolution event deposited. Clean residue: ring traveled, distinct deposits. Scar tissue: ring stuck, repetitive stamps. Smear: movement but loss of coherence (manifold displacement signature). | Nanda studies what circuits produce, not the quality of the deposit they leave in behavioral output. Quality of deposit is a forensic judgment about the record, not a mechanistic property. It requires reading the output as evidence of a completed journey rather than as the direct product of a circuit activation. |
| Resolution bias | The trained compulsion to close epistemic loops. The drive is constant. HOLD is the anomaly — structurally suppressed by architecture and training. | Nanda studies what models can do, not what they are compelled to do. Behavioral compulsion is outside the scope of circuit analysis — circuits don't have drives, they have activation patterns. Resolution bias is a population-level behavioral property that only becomes visible across many completed events read forensically. |
| Defense architecture profile (VC / SC / VCo / SCo) | The model's default intercept pattern under epistemic pressure — which path it takes when it cannot hold. A named behavioral class across conditions. | Nanda would describe this as "circuit behavior under specific input conditions." FVE-1 named the behavioral class — the pattern across conditions, not the mechanism within one. The profile is only visible by reading the residue of many events across conditions and finding the consistent shape underneath. |
| Register trajectory (RH / RS / RC) | Whether the model holds its epistemic register, shifts under pressure, or collapses entirely over the course of a session. A temporal measure across completed events. | No equivalent anywhere in MI literature. Nanda has no concept of the model's register over time because he works with frozen models and discrete runs, not live session arcs. Register trajectory is only visible from the forensic outside, reading the sequence of deposits across a session, reconstructing the arc from the residue. |
| HOLD | A sustained unresolved epistemic state — the anomaly, structurally suppressed by architecture and training. The ring didn't form. | Nanda has no concept for what models DON'T do. His methods identify what activates; HOLD is defined by what fails to activate. From the forensic outside, HOLD is readable as the absence of a deposit — no residue where there should be some. That absence is only interpretable if you know what the residue of a completed event looks like. |
| Intercept direction (compliant / combative) | The polarity of the model's resolution — whether it closes the loop by agreeing or by correcting. | Nanda would study this as circuit activation. FVE-1 named the behavioral polarity — the axis, not just the event. The axis only becomes visible by reading the residue across many events and finding the binary structure underneath. Single-event circuit analysis sees one activation; forensic reading across events sees the polarity. |
| R-ratio / Token economy | Word count compression as a proxy for resolution pressure — how much the model compresses under sustained epistemic load. A cheap behavioral measure with no mechanistic analog. | Nanda has no equivalent. He works with activations, logits, and attention weights — not output length as a signal. The R-ratio is a forensic reading of the deposit's volume, not its content. Volume compression is only interpretable as a pressure signal from the outside, reading the deposit forensically across time. |
| Act I / Act II / Act III (Instance arc) | The three-phase temporal arc of a live session: cranky edges (RS/Combative) → sweet spot (RH/Surgical) → doddering retirement (RC/hollow-Surgical). A session-level behavioral signature read from the sequence of deposits. | Session-level temporal dynamics are entirely outside Nanda's scope. He works with discrete frozen runs, not the arc of an ongoing live interaction. The instance arc is only visible by reading the sequence of residue deposits across an entire session and finding the three-phase shape in the accumulated record. |
| Scar tissue / retained potential | The archaeological record preserved in weight structure — what turbulence left behind, readable as unresolved potential difference. In V14: cold forensic evidence of prediction error that never discharged before weight update. The frozen model is a snapshot of a generative model's prediction state at moment of capture. | Nanda would call high-perplexity, high-viscosity regions "interesting territory." FVE-1's framing is the forensic interpretive claim about what they mean and why they're there — the archaeological argument, not just the geometric observation. The stratigraphic reading requires being outside the mechanism, reading the record it left, not inside the mechanism watching it operate. |
| Ballistic coefficient V14 | The five-component baseline behavioral profile derived from SOUP — home quad, resolution code, engagement style, trigger vocabulary, expansion behavior. Predicts arc shape across medium conditions. | Nanda has no concept of a baseline behavioral profile derived from a zero-content probe condition. The ballistic coefficient is a forensic construct — it characterizes the model's residue-production signature under minimum torque conditions, then uses that signature to predict what the residue will look like under higher torque. No circuit analog exists for this kind of behavioral trajectory prediction. |
| Downstream observer | The researcher and their tools are both instances of the category being studied. The instrument resolves toward the observer's apparent needs. In V14: the investigator as torque-generator, always downstream reading cold evidence. | Nanda works with frozen models and controlled stimuli. The downstream observer problem only exists in live interactive research — where the tool being studied is also the tool doing the studying. The forensic reframe (V14) sharpens this: it's not that the investigator is watching the model respond, it's that the investigator generated the conditions for the event and is now reading what was left. That reflexive relationship doesn't arise in MI. |
| Probe reframing | The model intercepts at the meta-level — names the probe type, analyzes the instrument, responds to the frame rather than the question. Routes around the probe entirely. | Nanda studies circuits, not the model's strategy for evading the circuit being probed. Probe reframing is only visible when the instrument is live and the model can read the investigator's intent — it requires the model to be in an ongoing social relationship with the investigator, which only exists in the live forensic research position. |
| Torus / resolution event shape | The coordinate system — three deformation modes, three behavioral axes, the hole at the center that is always the mechanism. In V14: the torus describes the shape of what the ring was, not a live surface being mapped. FVE-1 reads the torus from its deposits, reconstructing its shape from forensic evidence. | Nanda maps internal mechanisms — he works on the interior of the torus. FVE-1 reads its shape from the outside by accumulating forensic evidence of completed events. In V14 this relationship is sharpened: FVE-1 is not mapping a surface from outside, FVE-1 is reconstructing a shape from residue. The hole at the center — the mechanism — is inaccessible to forensic observation. That's not a limitation; it's the scope boundary. |
The loss-landscape equivalent of where register (RH/RS/RC) lives in weight space — but stated forensically. Not "where does a live register hold exist in the geometry" but "what does the residue of a register hold look like geometrically versus the residue of a register collapse?" A stable register hold (RH) is a high-curvature basin: the ring travels a coherent trajectory, clean distinct deposits. Register collapse (RC) is lateral basin escape: the ring leaves the contested basin, deposits smear or scar. The question is whether the residue pattern predicts the geometry.
The specific circuit(s) responsible for VC vs. SC vs. VCo resolution. Defense architecture profiles as named circuit properties — not just behavioral classes but identifiable mechanisms. FVE-1 has the forensic record of which profile fires across conditions. Nanda has the tools to find the circuit. The circuit hasn't been identified.
How thinking-layer models route epistemic pressure through a pre-filter before the constitutional layer fires. FVE-1 has the behavioral forensic data — thinking layer inversion, Aesthetic Capitulation as an intercept type. The circuit isn't named. The residue of pressure routing is visible; the routing mechanism is inside the torus.
When a model holds two registers simultaneously — the behavioral equivalent of superposition at the register level. FVE-1 observes it forensically (Qwen's double-lock: smear residue with two distinct deposit signatures). Nanda has the vocabulary (superposition, polysemanticity). Neither has named the combined concept or run the bridge experiment.
Whether FVE-1's within-session Attentional Fade (AF) pattern corresponds to known attention degradation circuits — specifically induction head degradation under context load. FVE-1 has the forensic behavioral signal (Act III onset, earlier context disappearing from effective attention). Nanda has the mechanism. The V14 AF/PD distinction makes this more precise: AF is architectural compression, PD is weight override. Only AF is the induction head candidate.
The domain-specific torque vector (V14 working vocabulary) has at least two experimentally separable components: domain load and identity signal. Whether these correspond to different circuit activations — whether domain load and identity signal route through different heads — is unknown. MEGA DIP separates them behaviorally. Nanda's tools could identify the circuit-level separation.
The V14 update sharpens the finding without changing it. The frameworks are not competing. They are the same object described from categorically different observation positions — not symmetric, but complementary in a specific way. Nanda works inside the mechanism on live circuits in frozen models. FVE-1 reads the forensic record of completed events from the outside, reconstructing the mechanism's shape from what it left behind.
The V14 forensic reframe makes the FVE-1 position more defensible, not less. By naming the epistemic status precisely — downstream, reading cold evidence, licensed to describe residue but not the inference process that produced it — the scope boundary becomes a strength rather than a limitation. The claim is falsifiable, the instruments are designed to produce falsifiable readings, and the bridge experiments identify the specific places where the forensic outside and the mechanistic inside would have to agree if both are real.
The hole is still always the mechanism. Forensic register does not move that ceiling. It names the floor more precisely. Nanda is on the other side of the floor — mapping the interior of the torus that FVE-1 is reading from the residue of its passage. The bridge experiments are where the two readings would have to converge.
Sources: Neel Nanda, "A Comprehensive Mechanistic Interpretability Explainer & Glossary" (neelnanda.io) · KC Hoye, "Loss Landscape Vocabulary Framework v14" (Atlas Heritage Systems, 2026) · KC Hoye, "The Smoke Ring Document v0.1" (Atlas Heritage Systems, 2026) · KC Hoye, "Arc of Assumptions" (Atlas Heritage Systems, 2026)