Threatens the physics isomorphism claim specifically
These confounds could independently distort the observed signal, the chosen order parameter, or the interpretation that prompt-stress effects reflect a structured energy landscape with measurable phase-transition-like boundaries rather than prompt artifacts, measurement noise, or investigator-side construction effects.
A Tier 1 confound is not "something that could matter in general." It is something that could produce the observed pattern in the absence of the claimed mechanism. Each Tier 1 item is treated as a live methodological constraint on current and future FVE-1 interpretations.
Threatens cognition claims FVE-1 does not make
These are real disputes in psycholinguistics, cognitive science, and language acquisition that concern whether LLMs learn like children, possess innate priors, or demonstrate human-like world modeling.
FVE-1 does not make those claims. It treats models as fixed black boxes with observed surface behavior under controlled stress. Tier 2 items are retained only so that external debates cannot be incorrectly imported as objections to claims this project is not making.
Kalra et al. (2026) · arXiv:2601.16979 — scalable critical sharpness measure; first demonstration of sharpness phenomena at LLM scale.
Storm (2019) · JARMC 8(1) — digital expansion of mind; memory offloading shifts cognitive boundary.
The following disputes are real and active in adjacent literatures. They do not function as live confounds on FVE-1 because FVE-1 does not make the claims these debates concern. They are retained here so that the scope boundary is explicit and so that external objections based on these debates can be correctly identified as category errors rather than substantive challenges to the present work.
| Confound | Where it matters | FVE-1 scope note |
|---|---|---|
| Memorization vs. generalization | Claims that LLMs learn human-like generalizations or exhibit genuine rule induction rather than surface pattern matching. Relevant for cognitive models, acquisition claims, and benchmark validity debates. | FVE-1 makes no claims about human-like learning. It measures behavioral responses under controlled stress, whatever their computational origin. Whether a CAPITULATION reflects memorization or generalization does not change its status as a behavioral event in the coded record. |
| Innate priors vs. statistical learning | Debates about whether human language acquisition requires innate priors that LLMs lack (Bowers 2024, 2025). Relevant for claims that LLM behavior is analogous to child language learning. | FVE-1 does not posit or deny human-like priors in LLMs. Models are tested as-is, under current weights and training, as fixed behavioral systems. The acquisition debate is irrelevant to the forensic reading of behavioral residue. |
| World knowledge vs. distributional knowledge | Whether apparent world-knowledge effects in LLMs reflect genuine world modeling or sophisticated distributional mimicry. Relevant for "understanding" debates and psycholinguistic evaluation. | FVE-1 does not infer internal world models. It works purely with behavioral residue — what the model produced, coded against locked predictions, in a specific session. The world-knowledge debate does not bear on that record. |
| Architecture generalization of phase-like behavior | Whether phase-transition-like behavioral effects are consistent across transformer architectures, model scales, and training regimes, or are architecture-specific artifacts. | FVE-1's panel is currently transformer-only. Cross-architecture generalization of the physics isomorphism claim is explicitly out of scope at this stage. The claim is behavioral and forensic within the defined panel. Constellation models include SSM and hybrid architectures as a future generalization test — not a current claim. |