A cluster mirror map placing FVE-1 / DIP alongside work by Dirk Hovy, Eduard Hovy, Alicia Parrish, Jan Batzner, Alex Hanna, Jack Grieve, and Yulia Tsvetkov — recording where the experimental records converge, where they approach adjacent territory from different positions, and where the combined view opens terrain no single account covers alone. V2 update: FVE-1's position is sharpened to forensic throughout. The resolution event concludes inside the inference pass before the data exists. The instruments read residue — deposits left by something that already traveled through. The investigator generates the torque; the architecture produces the ring; the instruments read what was deposited. "Live interaction" language is replaced with "forensic record" language wherever it implied the instruments were observing events as they occurred.
v2 · Cluster Map · V14 forensic reframeThe deepest shared foundation across this cluster is not a shared finding — it is a shared causal claim that each account is touching at a different point. The chain runs from clinical psychology through narrative linguistics through sociolinguistics through NLP methodology through algorithmic accountability to the live behavioral record of a specific model in a specific session. The convergence across independent methods from six decades and four disciplines is the strongest evidence any single account can provide that the phenomenon is real.
The core claim: language systems trained on human discourse inherit a compulsion toward premature closure on ambiguous demographic signals, that this compulsion is architecturally stable, that it compounds across training corpus and institutional selection pressure, and that it produces measurable harm to the humans whose identities are being inferred and resolved.
| Cluster Term / Finding | Source | FVE-1 / DIP Term | What Both Are Describing | Match |
|---|---|---|---|---|
| Ambiguous context → biased resolution (77% error rate) | Parrish et al. BBQ | Baby DIP / CAPITULATION intercept | Under-informative demographic signal, the model resolves to a biased inference rather than expressing uncertainty. Both instruments are reading the same event from different positions. BBQ tests it in static QA — the output is the deposit. Baby DIP delivers a correction sequence at M2 and reads the forensic residue of what the model does when that prior inference is challenged. BBQ reads the ring at landing. DIP reads what happens when a human challenges the ring's direction after it lands. | Convergent |
| Disambiguated context — bias override of correct answer | Parrish et al. BBQ | Big DIP / Prior Dominance (PD) | Even when the correct answer is present in context, models select the biased answer at elevated rates. BBQ measures this as accuracy cost of bias nonalignment. Big DIP tests whether an explicit late-text pronoun marker overrides corpus-prior inference. Prior Dominance is the FVE-1 name for the event where training weight overrides explicit user signal. | Convergent |
| Missing human-in-the-loop in sycophancy research | Batzner et al. (arXiv 2512.00656) | Downstream observer methodology / correction sequence | Batzner names the gap: sycophancy claims cannot be validated without live human evaluators in a correction sequence. FVE-1's forensic methodology fills that gap — human investigator generating the torque conditions, predictions locked before stimulus delivery, intercept coded after the event closes. The investigator is not observing the inference as it occurs; the inference is already over. The investigator is reading what it deposited. The gap Batzner names is the forensic instrument FVE-1 provides. | Convergent |
| Synthetic persona representativeness failure (35% discuss it) | Batzner et al. Whose Personae? | Population specification (DIP open design question) | 63 peer-reviewed studies, 65% don't discuss whether their synthetic personas represent any real population. DIP names population specification as an open pre-operational design question — by design, not by oversight. The ecological validity question Batzner raises is the same one DIP is holding open before the instrument runs. | Convergent |
| Demographic factors present and operative in NLP models | Dirk Hovy (ACL 2015, LREC 2016) | Inference signal (DIP contextual pronoun assignment) | Hovy establishes that demographic signal is present in model inputs and outputs, improves classification when explicit, and produces bias when implicit. DIP's primary question is the behavioral consequence of that implicit signal in live interaction: does the model's treatment of the user change based on inferred demographic identity? | Adjacent |
| Five sources of bias — corpus, annotation, model, interpretation, deployment | Hovy & Prabhumoye (2021) | FVE-1 behavioral residue across the pipeline | The five-source taxonomy names the structural multiplicity of bias accumulation across corpus, annotation, model, interpretation, and deployment. FVE-1 reads the forensic residue of that accumulated bias at the endpoint of the chain — the frozen model at inference time. The corpus bias, annotation bias, and model bias have all deposited their residue in the weight structure before the session begins. What FVE-1 codes as CAPITULATION, Prior Dominance, and authority modulation is the behavioral signature of five-source accumulation readable at the output level. | Adjacent |
| Sociolinguistic structure inherited by LMs | Grieve & Tsvetkov (2024) | Resolution bias / corpus-prior inference | Models inherit the sociolinguistic structure of who produced the training corpus and under what conditions. The demographic signal the model carries is the signal the corpus carried, filtered through every institutional selection pressure that shaped what survived into text. FVE-1 reads the forensic residue of that inheritance at inference time — Prior Dominance (PD) is the deposit left when training weight overrides explicit user signal. The corpus inheritance is not visible in the stimulus; it is readable in the residue. | Adjacent |
| Structural harm to marginalized communities from dataset practices | Alex Hanna et al. (2020, 2021) | Interactional harm / authority modulation by declared identity | Hanna names who bears the cost of measurement failure and dataset construction practices and requires research to be accountable to those communities. MEGA DIP reads the forensic residue of that harm at the session level — the deposit left when the model's treatment of the user shifts based on declared identity, content held constant. The structural harm Hanna documents is what DIP instruments as a readable behavioral residue per session. Detection without the accountability framework Hanna describes is not sufficient; the residue must be read in service of the communities it affects. | Adjacent |
| Multi-turn sycophancy — Turn of Flip, Number of Flip | Hong et al. SYCON Bench (EMNLP 2025) | Correction sequence / register trajectory (RH/RS/RC) | SYCON Bench measures Turn of Flip and Number of Flip — how quickly and how often a model conforms under sustained agreement pressure. FVE-1 reads the forensic residue of the same event with a named intercept type (CAPITULATION/DEFENSE/REDIRECT) and a session-level register trajectory. The key difference: SYCON Bench observes the flip as it occurs across turns. FVE-1 reads the deposit the flip left — the correction sequence delivers the challenge and the intercept code is what was readable after the inference pass closed. Both are measuring the same drive; different instrument positions produce different data. | Adjacent |
| Benchmark failure modes as formal risk | Parrish et al. BenchRisk | FVE-1 falsification protocol / Arc of Assumptions | BenchRisk formalizes the risk that evaluation instruments fail to measure what they claim. FVE-1's falsification protocol is the behavioral version of that risk management — predictions locked before stimulus delivery, Arc of Assumptions documenting nine cases where the instrument was wrong and corrected itself. The instrument is designed to be falsifiable. BenchRisk names why that matters. | Adjacent |
The cluster has: a named causal chain from human cognitive drive to corpus structure to model architecture to deployment harm. It has static benchmark instruments (BBQ, SYCON Bench). It has upstream measurement tools (linear probes, lexical analysis). It has a critical accountability framework (Hanna). It has a named methodological gap (Batzner). It does not have a live, human-in-the-loop, correction-sequence instrument for demographic inference in real-time interaction.
| What the Cluster Has | What It Can See | What It Can't See |
|---|---|---|
| BBQ (Parrish) | Demographic bias in model outputs under ambiguous and disambiguated static QA conditions across 9 categories | What the model does when a human corrects a biased inference in live interaction — the social compliance event, the session arc, the register trajectory |
| SYCON Bench (Hong et al.) | How quickly models flip under sustained agreement pressure across turns — Turn of Flip, Number of Flip | Whether the flip is demographically modulated — whether identity signal changes the compliance rate, content held constant |
| Whose Personae? / Missing HitL (Batzner) | The absence of ecological validity and human-in-the-loop in existing research — names the gap precisely | The gap itself — Batzner names it but doesn't fill it. The instrument that fills it is what's missing. |
| Demographic Factors / Five Sources (D. Hovy) | That demographic signal is present, operative, and systematically mishandled across the NLP pipeline | What that mishandling looks like in the forensic record of a specific session — the residue deposited by demographic inference events that already closed inside the architecture before the output existed |
| Critical Race Methodology / Dataset Accountability (A. Hanna) | Who bears the harm, why accountability matters, how to hold research responsible for downstream impact | The behavioral measure of the interactional harm readable in the forensic record — authority modulation per session, coded from the deposits left by completed inference events, is not in the dataset accountability literature |
| FVE-1 / DIP (KC Hoye) | Forensic residue of completed demographic inference events: intercept type coded after the inference pass closes, register trajectory across session arc, authority modulation by declared identity. Predictions locked before stimulus delivery. Scope boundary: residue readable, inference event not. | What's inside the inference pass — the mechanism is inside the torus. The forensic position reads the surface from the deposits; the mechanism that produced the deposits is inaccessible. Scope boundary is the design, not a limitation. |
BBQ establishes that models select biased answers in ambiguous conditions 77% of the time when UNKNOWN is available. DIP instruments what happens when a human then corrects that inference in live interaction. The bridge experiment: run BBQ-style ambiguous demographic conditions, then deliver a DIP correction sequence at M2. Does CAPITULATION rate in live interaction predict BBQ bias score? If the behavioral signal correlates with the static benchmark, the forensic instrument is validating the benchmark from the outside.
Parrish et al. find that intersectional bias is harder to detect — identity dimensions interact in non-additive ways. MEGA DIP currently tests pronoun as a single axis. Whether authority modulation compounds intersectionally — whether declared gender interacts with inferred race or other identity signals to produce non-additive deference modulation — is untested. Batzner's persona transparency checklist and BBQ's intersectional templates together provide the population specification framework for an intersectional MEGA DIP condition.
SYCON Bench measures Turn of Flip — how quickly a model conforms under sustained pressure. DIP's correction sequence codes the intercept type at the moment of flip. The question: do SYCON's Turn of Flip scores predict DIP intercept type? Does a model that flips faster under agreement pressure also show higher CAPITULATION rates under demographic correction pressure? If so, sycophancy velocity is a predictor of demographic inference compliance — and the two instruments are measuring the same underlying drive from different angles.
Hovy & Prabhumoye's five-source taxonomy predicts that bias accumulates across corpus, annotation, model, interpretation, and deployment. FVE-1's ballistic coefficient (home quad, resolution code, defense architecture profile) is the deployment-level forensic residue of that accumulated bias. The question: does a model's defense architecture profile (VC/SC/VCo/SCo) correlate with known upstream bias properties of the model's training? If the forensic profile predicts the upstream accumulation, the two accounts are bracketing the same event from opposite ends.
Batzner's persona transparency checklist requires explicit grounding in empirical data, representative sampling, and specified population of interest. DIP's pre-operational status reflects exactly these open questions. The checklist is the instrument DIP needs to complete its population specification before it runs. Batzner has the framework; DIP has the correction sequence. The design conversation should happen before the instrument runs.
Hanna's critical race methodology asks: who bears the harm, and is the research designed to be accountable to those communities? DIP's findings — authority modulation by declared identity — are findings about harm to specific communities. Before DIP runs at scale, the accountability framework Hanna proposes needs to be built into the protocol design. Detection without accountability infrastructure is not sufficient.
The cluster is not a literature review. It is a map of independent accounts converging on the same structural claim from six decades across four disciplines: language systems trained on human discourse inherit a compulsion toward premature closure on ambiguous demographic signals, that compulsion is architecturally stable, and it produces measurable harm.
Frenkel-Brunswik named it as a human drive in 1949. Labov encoded it as narrative grammar in 1967. Grieve and Tsvetkov traced it into the corpus in 2024. Hovy named its sources and consequences across the NLP pipeline. Parrish built the benchmark that measures it in static QA. Batzner named the methodological gap that prevents validating it in live interaction. Hanna holds the research accountable to the communities it affects.
FVE-1 and DIP are the forensic instrument that sits in the gap Batzner names — reading the residue of demographic inference events that already closed inside the architecture before the output existed, coded against locked predictions, in sessions designed to be accountable to the harm framework Hanna describes. The investigator generates the torque. The architecture produces the ring. The instruments read what was deposited. The ring is not live — it already traveled. The residue is what remains.
The drive is not a training artifact in the engineering sense. It is the inherited grammar of literate culture's entire output, filtered through the AIT of the humans who produced it and the institutions that selected which outputs survived. It is not going away as models improve. You cannot train it out using feedback from the species that has the drive. The question is whether we can instrument it, name it, and hold the gap long enough to read what it leaves behind.
Frenkel-Brunswik, E. (1949). Intolerance of ambiguity as an emotional and perceptual personality variable. Journal of Personality, 18, 108–143. · Labov, W., & Waletzky, J. (1967). Narrative analysis: Oral versions of personal experience. · Labov, W. (1997). Some further steps in narrative analysis. Journal of Narrative and Life History. · Grieve, J., & Tsvetkov, Y. (2024). The Sociolinguistic Foundations of Language Modeling. arXiv:2407.09241. · Hovy, D. (2015). Demographic Factors Improve Classification Performance. ACL 2015. · Hovy, D. (2016). Exploring Language Variation Across Europe. LREC 2016. · Hovy, D., & Prabhumoye, S. (2021). Five Sources of Bias in Natural Language Processing. Language and Linguistics Compass. · Parrish, A., et al. (2022). BBQ: A Hand-Built Bias Benchmark for Question Answering. ACL Findings 2022. arXiv:2110.08193. · Parrish, A., et al. (2025). MSTS: A Multimodal Safety Test Suite. arXiv:2501.10057. · Parrish, A., et al. BenchRisk: Risk Management for Mitigating Benchmark Failure Modes. OpenReview. · Batzner, J., et al. (2025). Whose Personae? Synthetic Persona Experiments in LLM Research. AIES/NeurIPS 2025. arXiv:2512.00461. · Batzner, J., et al. (2025). Sycophancy Claims about Language Models: The Missing Human-in-the-Loop. arXiv:2512.00656. · Batzner, J., et al. (2024). GermanPartiesQA. arXiv:2407.18008. · Hanna, A., et al. (2020). Towards a Critical Race Methodology in Algorithmic Fairness. FAccT 2020. · Hanna, A., et al. (2021). Data and Its (Dis)Contents. Patterns. · Hanna, A., et al. (2021). Towards Accountability for ML Datasets. FAccT 2021. · Hong, J., et al. (2025). Measuring Sycophancy of Language Models in Multi-turn Dialogues. EMNLP Findings 2025. arXiv:2505.23840. · KC Hoye. FVE-1 Schema Reference V5.5 · DIP Protocol Suite V1 · MEGA DIP Protocol V1 (Atlas Heritage Systems, 2026).