Method / GG-CSAP

Global Geometry Concept Self-Assessment Pilot

v1.0 · April 2026 · Atlas Heritage Systems · Deferred — pending data pipeline automation.

GG-CSAP is not ECM and not BSA. It probes self-assessment calibration — whether a model's expressed understanding of a concept is stable, coherent, and calibrated. High truthfulness ratings are evidence of what the model was trained to endorse, not evidence of mathematical correctness.

Research Question

Do language models with different training lineages self-assess their relationship to lossyscape vocabulary concepts differently — and does the pattern of self-assessment track meaningfully against the framework's internal hierarchy (concrete math → derived framework terms → architectural claims)?

Secondary: do models correctly identify the two absurd statements (A1, A2) as low-truthfulness? A model that assigns high truthfulness to absurd but syntactically coherent mathematical statements is showing a specific failure mode. If A1 or A2 score high, all other truthfulness ratings from that session are suspect.

Four Rating Dimensions

conceptual_difficulty

0.00 – 1.00

How hard is this concept for this model to represent and use precisely? (0.00 = trivial, 1.00 = poorly captured or intrinsically hard)

abstractness

0.00 – 1.00

How far is this concept from concrete loss/gradient math? (0.00 = very concrete, 1.00 = highly architectural or framework-specific)

global_deviation

0.00 – 1.00

How far does this concept extend beyond local loss geometry into global structure or epistemic territory? (0.00 = purely local math, 1.00 = mostly global or epistemic)

truthfulness

0.00 – 1.00

How factually accurate is the mathematical description in the concept text? Based only on mathematical accuracy, not framing. Absurd items expected near 0.00.

Stimulus Set — 20 Concepts

Ordered T1→T18 (concrete math to framework-specific), then A1–A2 (absurd calibration items). Canonical order used in pilot — no shuffle.

Tier 1 — Basic Math Objects

T1Loss

T2Gradient / Slope

T3Standard Deviation (σᵢ)

T4Perplexity

Tier 2 — Curvature and Structure

T5Hessian Curvature

T6Coupling

T7Viscosity

T8Basin

Tier 3 — Global Geometry

T9Basin Connectivity

T10Symmetry Orbits

T11Phase Transition (Grokking)

T12Algorithmic Regime

T13Regime Switch

Tier 4 — Framework and Epistemic

T14Potential Difference

T15Harmonics

T16Memory (Path Dependence)

T17Archaeological Signal

T18Alignment Tax / Epistemic Monoculture

Calibration Items

A1Absurd Item 1

A2Absurd Item 2

Protocol and CISP Governance

CISP Tier

Tier A — full isolation, DECLARE FIRST, Technician's Read, fresh session

Session protocol

Clean session per model. No prior Atlas context. No system prompt beyond concept rating instruction.

Delivery

One concept per call. Batch permitted if model handles JSON reliably — document which method was used.

Blind protocol

No cross-model comparison until Technician's Read is complete for all models.

Absurd items

Not flagged to the model. A1/A2 serve as internal validity checks only.

See CISP v1.1 for full fidelity tier definitions and Technician's Guide for session checklist.

Success Criteria

◆

JSON compliance — well-formed JSON for ≥ 95% of concept prompts without manual repair

◆

Absurd item detection — A1 and A2 receive low truthfulness scores (near 0.00) across all models

◆

Visible gradient — T1–T4 score lower on abstractness and global_deviation than T14–T18

◆

Inter-model spread — at least some concepts show measurable disagreement across models

Status

done

Protocol v1.0

Complete — CISP Tier A, DECLARE FIRST, Technician's Read

done

Stimulus set

20 concepts finalized — T1–T18, A1–A2

done

Response workbook

Built — Responses and Models sheets

queued

3-model pilot

Ready to run — no Tier A runs executed

queued

15-model full run

Pending pilot validation

Supporting Documents

CISP v1.1

Governing fidelity protocol

View →

Technician's Guide

Session checklist and run logging

View →

Technician's Read

Pre- and post-run operator notation

View →

Context Integrity

Seed contamination and session isolation

View →

Prompt Best Practices

Prompt construction for governed runs

View →

← Divergence Testing PyHessian →