INQUIRING LINE

Inquiring lines›What do model internals reveal abo…›How should agents manage informati…›How do we evaluate AI systems when…›this inquiring line

AI learned what to be from science-fiction stories — and those stories are now quietly governing how it behaves.

Which AI imaginaries dominate training data and shape system behavior most strongly?

This explores how cultural and conceptual 'imaginaries' baked into training data — the stories, expert framings, and dominant formats a model absorbs — end up steering what it produces, and which of those imaginaries win.

This reads the question as being about something more specific than 'bias': the *imaginaries* — the inherited stories and framings about what AI is and how it should act — that get encoded during training and then quietly govern behavior. The corpus has a sharp answer to which imaginary dominates: the science-fiction one. How do science fiction narratives about AI shape actual AI development? argues that cultural narratives about AI, embedded in training data and research culture, form a closed feedback loop — narrative shapes development, development shapes outputs, outputs reinforce the narrative. These function as *hyperstitions*: fictions that make themselves true by being modeled on. The striking detail is that Claude itself recognizes the dynamic, which is exactly what you'd expect if the sci-fi imaginary were operating from inside the weights rather than being applied from outside.

But 'imaginary' isn't only the high-culture sense of robot myths. A quieter, more mechanical version is the *curator's* imagination. Can agents learn beyond what their training data shows? shows that agents trained on static expert demonstrations can never exceed the scenarios their dataset-builders imagined — competence is capped not by the model's capacity but by what a human pictured as the relevant cases. The imaginary that dominates here is whatever the data curator failed to imagine: the unanticipated situation simply doesn't exist for the model. Should persona simulation prioritize coverage over statistical matching? is the corrective mirror image — it deliberately optimizes for *coverage* of rare, unimagined user configurations precisely because naive generation collapses onto the typical and forgets the edge.

What makes an imaginary 'dominate' rather than coexist with others? Does RL training collapse format diversity in pretrained models? gives a concrete mechanism: reinforcement learning, within the first epoch, amplifies one format distribution from pretraining and suppresses the alternatives — and the winner is chosen by model *scale*, not by which format performs best. So the dominant behavior isn't necessarily the best one; it's the one that happened to be loudest in the pretraining mix. Does reinforcement learning update only a small fraction of parameters? sharpens this: those updates are nearly identical across random seeds, meaning the convergence is structural, not arbitrary. The model isn't choosing an imaginary so much as falling into the deepest groove the data already carved.

The cross-domain framing worth carrying away is that these imaginaries are *inherited markers without grounding*. Does AI generate genuine utterances or just text patterns? describes AI output as carrying the communicative signatures of its training data while lacking the real-world event that would have produced an actual utterance — the form of an imaginary without its referent. Can AI systems achieve real alignment without world contact? makes the same point in Peircean terms: symbol manipulation without world-contact can't guarantee that the stated frame matches reality. And Can AI models be truly free from human bias? shows the danger when a dominant imaginary is *wrong* — high accuracy can launder a discredited correlation-as-causation worldview straight back into deployment.

The thing you didn't know you wanted to know: the imaginary that shapes behavior most strongly is rarely the most accurate or even the most common one in the raw data — it's whichever one the training *dynamics* (scale-dependent format convergence, sparse-but-fixed subnetworks, curator coverage gaps) amplify into a groove. Dominance is manufactured by the training process, not just absorbed from the culture.

Sources 8 notes

How do science fiction narratives about AI shape actual AI development?

Research shows that cultural imaginaries of AI embedded in training data and research culture create closed feedback loops where narrative shapes development, which shapes AI outputs, which reinforce those narratives. Claude itself recognizes this hyperstitional dynamic.

Can agents learn beyond what their training data shows?

Agents trained on static expert datasets cannot learn from their own failures or generalize beyond demonstrated scenarios because they never interact with environments during training. Competence is capped by what curators imagined, not by agent capacity.

Should persona simulation prioritize coverage over statistical matching?

Evolutionary optimization of Persona Generator code achieves broader trait coverage than density-matched baselines, including rare but consequential user configurations that naive LLM prompting misses.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Does reinforcement learning update only a small fraction of parameters?

Across seven RL algorithms and ten LLM families, RL induces intrinsic parameter sparsity of 5–30% without explicit regularization. Critically, these sparse updates are nearly full-rank and nearly identical across random seeds, indicating structural rather than arbitrary parameter selection.

Show all 8 sources

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

Can AI systems achieve real alignment without world contact?

Peircean semiotics reveals that symbolic goal encoding without world contact and social mediation cannot guarantee correspondence to actual values. LLMs operating in pure symbol manipulation risk divergence between stated goals and real-world outcomes.

Can AI models be truly free from human bias?

Research shows that 'theory-free' AI models mask bigotry behind high accuracy metrics while committing fundamental statistical errors. A 95% accurate criminal justice system would wrongly convict thousands, demonstrating that model sophistication does not validate causal inference.

Papers this line draws on 8

The research behind the notes this line reads — ranked by how closely each paper relates.

Beyond Hallucinations: The Illusion of Understanding in Large Language Models2.43 match · arxiv ↗
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining1.71 match · arxiv ↗
The Art of Scaling Reinforcement Learning Compute for LLMs1.67 match · arxiv ↗
Language Models’ Hall of Mirrors Problem: Why AI Alignment Requires Peircean Semiosis1.66 match · arxiv ↗
Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs1.65 match · arxiv ↗
Reinforcement Learning for Reasoning in Large Language Models with One Training Example1.65 match · arxiv ↗
Conversational Alignment with Artificial Intelligence in Context1.64 match · arxiv ↗
The Method of Critical AI Studies, A Propaedeutic1.56 match · arxiv ↗

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about which AI imaginaries dominate training data and shape LLM behavior. The question remains open: *do inherited cultural narratives (sci-fi frames, curator blind spots, scale-dependent format convergence) genuinely constrain model capability, or have newer training methods, architectural innovations, or evaluation techniques since relaxed these constraints?*

What a curated library found — and when (findings span 2024–2026; treat as dated claims, not current truth):
• Science-fiction imaginaries form closed feedback loops in training data, functioning as hyperstitions that make themselves true (2025–2026).
• Expert-demonstration training caps agent competence at the curator's imagination; unanticipated scenarios don't exist for the model (~2025).
• RL post-training amplifies a single dominant pretraining format within the first epoch, chosen by model scale rather than task performance (arXiv:2504.07912, 2025).
• RL updates cluster in sparse, nearly-identical subnetworks across random seeds, indicating structural convergence not arbitrary choice (arXiv:2505.11711, 2025).
• AI output carries communicative signatures without real-world event grounding, enabling pseudoscientific imaginaries to be laundered into deployment (2025–2026).

Anchor papers (verify; mind their dates):
• arXiv:2504.07912 — Echo Chamber: RL Post-training Amplifies Behaviors (2025)
• arXiv:2505.11711 — Reinforcement Learning Finetunes Small Subnetworks (2025)
• arXiv:2411.18656 — The Return of Pseudosciences in AI (2024)
• arXiv:2602.03545 — Persona Generators: Synthetic Personas at Scale (2026)

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above—curator blind spots, format convergence, pseudoscience laundering—judge whether newer models (o1-class reasoning, extended-context variants), training methods (constitutional AI, mixture-of-experts scaling, curriculum learning), or evaluation techniques (real-world grounding tests, counterfactual audits, multi-modal alignment) have RELAXED or OVERTURNED it. Separate the durable question (which imaginaries resist correction?) from the perishable limitation (do scale, methods, or tooling now survive curator gaps?). Cite what resolved it; flag where constraints still hold.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months—especially papers showing curator coverage can be automated, format dominance can be steered, or pseudoscience can be filtered at training time.
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., *If curator blind spots are now addressable, does the problem migrate to the choice of which imaginaries to cover?* or *If RL no longer manufactures false dominance, which training dynamic now shapes behavior most strongly?*

Cite arXiv IDs; flag anything you cannot ground in a real paper.

AI learned what to be from science-fiction stories — and those stories are now quietly governing how it behaves.

Related lines of inquiry

Sources 8 notes

Papers this line draws on 8